Back-of-Envelope Estimation — System Design Space

Architecture sketched before you estimate the order of magnitude often ends up elegant but irrelevant to the actual scale of the problem.

This chapter shows how quick sizing exposes what saturates first: traffic, memory, disks, network, or the latency budget. That saves time by filtering options through a rough capacity model before the design discussion gets too detailed.

In system design interviews, this is one of the clearest signs of maturity: state sensible assumptions, identify the primary constraint, and only then propose an architecture that truly fits the scale you uncovered.

Practical value of this chapter

Estimate in Minutes

Translate a product scenario into numbers quickly: RPS/QPS, traffic volume, storage growth, and first-order infrastructure limits.

Primary Bottleneck

Use memory, storage, network, and latency-budget estimates to identify which layer saturates first and where headroom is required.

Decision Framing

Tie the numbers back to pattern choices such as caching, replication, sharding, asynchronous flows, and service targets.

Interview Delivery

Present a clear interview structure: assumptions, formulas, the primary risk, and the next evolution step.

Core reference

Designing Data-Intensive Applications, 2nd Edition

A strong anchor for reasoning about throughput, storage growth, and the trade-offs between response time and system cost.

Читать обзор

Back-of-Envelope Estimation is not about guessing the perfect number. It is about understanding order of magnitude, expected throughput, and the first hard limit in the first few minutes of a design conversation.

Before choosing technology, it helps to lock the core user journey, the SLA/SLO you are trying to protect, and the latency budget you can spend across the system. That makes the first architectural constraint visible early instead of after the diagram is already drawn.

Quick constants and units

Time

1 day = 86,400 s
1 month ≈ 30 days
Peak load is often 2-5x the average

Traffic

RPS = requests/day / 86,400
RPS_peak = RPS_avg * peak factor
Read and write paths should be sized separately

Bytes

1 KB = 10^3 B
1 MB = 10^6 B
1 GB = 10^9 B

Bandwidth

bytes/sec = RPS * response_size
bits/sec = bytes/sec * 8
Add protocol overhead on top

Memory

RAM = hot working set
Add connection and session overhead
Keep 20-30% headroom

Storage

storage/day = writes/day * average_row_size
Include indexes and metadata
Multiply by replication factor

A 5-step estimation template

Five steps are usually enough to move from product language to numbers. The important part is to separate sustained peaks from short bursts and to isolate the path that will actually define cost and operational difficulty.

Step 1

Lock the assumptions

Define the audience size, the core user journey, the read/write split, request and response size, and the service-level target you are trying to protect.

Step 2

Translate product metrics into load

Estimate average and peak RPS/QPS, separate sustained peaks from short bursts, and isolate the most expensive write path.

Step 3

Estimate resources per layer

Size memory for hot data and cache, long-term storage growth, and the network traffic that moves between services and out of the system.

Step 4

Allocate the latency budget

Split p95/p99 across the edge layer, application service, cache, database, and external dependencies.

Step 5

Find the first hard limit

Decide what breaks first under 5-10x growth: CPU, memory, network, disk IOPS, or operational complexity.

From traffic to RPS/QPS

Average and peak load

requests/day = DAU * requests_per_user/day
RPS_avg = requests/day / 86,400
RPS_peak = RPS_avg * peak_factor

Read/write split

RPS_read = RPS_total * read_ratio
RPS_write = RPS_total * write_ratio
the write path usually drives storage and durability costs

Averages are rarely enough. You also need to know what happens in p99 and whether real latency still fits the user promise in the slow, uncomfortable part of the distribution.

RAM and long-term storage

For memory, the key question is the hot working set needed for fast reads. For storage, the important part is daily growth, index overhead, metadata, and replication. Skip any of those and your estimate will look better than the system really is.

RAM

Start from hot data and cache, then add connection and session overhead plus enough headroom for real production variance.

Storage horizon

storage/day = writes/day * average_row_size
total/day = storage/day * replication_factor * (1 + index_overhead)
storage/N months = total/day * 30 * N

Network bandwidth and outbound traffic

Measure ingress and egress separately. For each service hop, start with bytes/sec = RPS * response_size, then add headers, TLS, retries, fan-out, and retransmits. That is often enough to reveal whether you need compression, batching, or a simpler contract between services.

How to allocate a latency budget

Layer

p95 budget

Notes

Edge / LB

10 ms

TLS and routing

Application service

25 ms

business logic

Cache

8 ms

hot read

Primary database

35 ms

critical query

External dependency

20 ms

payments or risk service

If the total is already close to the target SLO, tail latency becomes the real risk. That is usually where a design starts to look good on average but fragile in practice.

Four mini-cases with quick takeaways

In fast estimation, the number is only half of the story. The other half is the type of difficulty: fan-out, cache invalidation, durability, compaction, backpressure, or retry storms.

Case 1: URL shortener

100M redirects/day
response ≈ 1.2 KB
peak factor = 3x

Estimate: RPS_avg ≈ 1,160 and RPS_peak ≈ 3,500. Peak outbound traffic is about 4.2 MB/s (~34 Mbps) before TLS and header overhead.

Main difficulty: The hard part is usually not raw storage volume, but cache hit ratio and read-path latency.

Case 2: Notification delivery

5M events/day
average fan-out = 8 recipients
message size ≈ 0.8 KB

Estimate: Delivery operations/day = 40M, average load ≈ 463 ops/sec. With a 6x peak, the system must handle roughly 2.8K ops/sec across delivery channels.

Main difficulty: The real risk is keeping delivery queues under control, avoiding retry cascades, and preserving priority for important channels.

Case 3: Timeline cache

10M DAU
20 timeline reads per user per day
working set = 15% of the audience

Estimate: Reads/day = 200M, so average RPS is ≈ 2,315. The hot cohort is about 1.5M users. At 8 KB per cached object, that is roughly 12 GB before overhead.

Main difficulty: The key challenge is maintaining a strong hit ratio without turning cache invalidation into a request storm.

Case 4: Orders service with a write-heavy peak

30M orders/day
row size = 2.5 KB
replication factor = 3

Estimate: Raw writes are about 75 GB/day. With RF=3, that becomes 225 GB/day before indexes and metadata, which can easily add another 30-60%.

Main difficulty: The expensive part here is the write path and long-term storage growth, not only current QPS.

How to spot the real difficulty quickly

CPU bound

High CPU utilization with moderate I/O usually points to heavy business logic, serialization, or encryption work.

Memory bound

Object churn, GC pauses, cache misses, and rising tail latency often point to a memory limit rather than a pure compute problem.

Storage bound

Queue depth growth, read/write IOPS saturation, and background maintenance dominating service time all point to storage pressure.

Network bound

Inter-service traffic starts consuming the latency budget, and every extra hop becomes more expensive than it first looked.

Common back-of-envelope anti-patterns

Using averages alone and ignoring real peaks or short bursts.

Collapsing read and write paths into one number and hiding the real bottleneck.

Sizing storage without indexes, replication, and retention horizon.

Drawing the architecture first and only then checking latency and capacity budgets.

Leaving no operational headroom, as if production always runs under ideal conditions.

A good estimation pass ends not just with numbers, but with a clear answer: what will limit the system first, and what architectural move that limit forces next.

Related chapters

Design principles for scalable systems - provides the quality baseline and helps connect estimates to architectural limits.
Load balancing - shows how peak RPS translates into practical traffic distribution and overload protection.
Caching strategies - extends the discussion of hot working sets, hit ratio, and the real cost of a cache miss.
Replication and sharding - explains how storage-growth estimates lead to replication, sharding, and rebalancing decisions.
Event-Driven Architecture - adds queue and pipeline sizing once the system moves toward asynchronous flows.
URL Shortener - shows how quick sizing can directly shape architecture choices.
Notification System - gives a practical example of fan-out and its impact on infrastructure size.
Rate Limiter - complements this chapter with burst control and critical-path protection.