Architecture sketched before you estimate the order of magnitude often ends up elegant but irrelevant to the actual scale of the problem.
This chapter shows how quick sizing exposes what saturates first: traffic, memory, disks, network, or the latency budget. That saves time by filtering options through a rough capacity model before the design discussion gets too detailed.
In system design interviews, this is one of the clearest signs of maturity: state sensible assumptions, identify the primary constraint, and only then propose an architecture that truly fits the scale you uncovered.
Practical value of this chapter
Estimate in Minutes
Translate a product scenario into numbers quickly: RPS/QPS, traffic volume, storage growth, and first-order infrastructure limits.
Primary Bottleneck
Use memory, storage, network, and latency-budget estimates to identify which layer saturates first and where headroom is required.
Decision Framing
Tie the numbers back to pattern choices such as caching, replication, sharding, asynchronous flows, and service targets.
Interview Delivery
Present a clear interview structure: assumptions, formulas, the primary risk, and the next evolution step.
Core reference
Designing Data-Intensive Applications, 2nd Edition
A strong anchor for reasoning about throughput, storage growth, and the trade-offs between response time and system cost.
Back-of-Envelope Estimation is not about guessing the perfect number. It is about understanding order of magnitude, expected throughput, and the first hard limit in the first few minutes of a design conversation.
Before choosing technology, it helps to lock the core user journey, the SLA/SLO you are trying to protect, and the latency budget you can spend across the system. That makes the first architectural constraint visible early instead of after the diagram is already drawn.
Quick constants and units
Time
- 1 day = 86,400 s
- 1 month ≈ 30 days
- Peak load is often 2-5x the average
Traffic
- RPS = requests/day / 86,400
- RPS_peak = RPS_avg * peak factor
- Read and write paths should be sized separately
Bytes
- 1 KB = 10^3 B
- 1 MB = 10^6 B
- 1 GB = 10^9 B
Bandwidth
- bytes/sec = RPS * response_size
- bits/sec = bytes/sec * 8
- Add protocol overhead on top
Memory
- RAM = hot working set
- Add connection and session overhead
- Keep 20-30% headroom
Storage
- storage/day = writes/day * average_row_size
- Include indexes and metadata
- Multiply by replication factor
A 5-step estimation template
Five steps are usually enough to move from product language to numbers. The important part is to separate sustained peaks from short bursts and to isolate the path that will actually define cost and operational difficulty.
Step 1
Lock the assumptions
Define the audience size, the core user journey, the read/write split, request and response size, and the service-level target you are trying to protect.
Step 2
Translate product metrics into load
Estimate average and peak RPS/QPS, separate sustained peaks from short bursts, and isolate the most expensive write path.
Step 3
Estimate resources per layer
Size memory for hot data and cache, long-term storage growth, and the network traffic that moves between services and out of the system.
Step 4
Allocate the latency budget
Split p95/p99 across the edge layer, application service, cache, database, and external dependencies.
Step 5
Find the first hard limit
Decide what breaks first under 5-10x growth: CPU, memory, network, disk IOPS, or operational complexity.
From traffic to RPS/QPS
Average and peak load
requests/day = DAU * requests_per_user/day
RPS_avg = requests/day / 86,400
RPS_peak = RPS_avg * peak_factor
Read/write split
RPS_read = RPS_total * read_ratio
RPS_write = RPS_total * write_ratio
the write path usually drives storage and durability costs
Averages are rarely enough. You also need to know what happens in p99 and whether real latency still fits the user promise in the slow, uncomfortable part of the distribution.
RAM and long-term storage
For memory, the key question is the hot working set needed for fast reads. For storage, the important part is daily growth, index overhead, metadata, and replication. Skip any of those and your estimate will look better than the system really is.
RAM
Start from hot data and cache, then add connection and session overhead plus enough headroom for real production variance.
Storage horizon
storage/day = writes/day * average_row_size
total/day = storage/day * replication_factor * (1 + index_overhead)
storage/N months = total/day * 30 * N
Network bandwidth and outbound traffic
Measure ingress and egress separately. For each service hop, start with bytes/sec = RPS * response_size, then add headers, TLS, retries, fan-out, and retransmits. That is often enough to reveal whether you need compression, batching, or a simpler contract between services.
How to allocate a latency budget
If the total is already close to the target SLO, tail latency becomes the real risk. That is usually where a design starts to look good on average but fragile in practice.
Four mini-cases with quick takeaways
In fast estimation, the number is only half of the story. The other half is the type of difficulty: fan-out, cache invalidation, durability, compaction, backpressure, or retry storms.
Case 1: URL shortener
- 100M redirects/day
- response ≈ 1.2 KB
- peak factor = 3x
Estimate: RPS_avg ≈ 1,160 and RPS_peak ≈ 3,500. Peak outbound traffic is about 4.2 MB/s (~34 Mbps) before TLS and header overhead.
Main difficulty: The hard part is usually not raw storage volume, but cache hit ratio and read-path latency.
Case 2: Notification delivery
- 5M events/day
- average fan-out = 8 recipients
- message size ≈ 0.8 KB
Estimate: Delivery operations/day = 40M, average load ≈ 463 ops/sec. With a 6x peak, the system must handle roughly 2.8K ops/sec across delivery channels.
Main difficulty: The real risk is keeping delivery queues under control, avoiding retry cascades, and preserving priority for important channels.
Case 3: Timeline cache
- 10M DAU
- 20 timeline reads per user per day
- working set = 15% of the audience
Estimate: Reads/day = 200M, so average RPS is ≈ 2,315. The hot cohort is about 1.5M users. At 8 KB per cached object, that is roughly 12 GB before overhead.
Main difficulty: The key challenge is maintaining a strong hit ratio without turning cache invalidation into a request storm.
Case 4: Orders service with a write-heavy peak
- 30M orders/day
- row size = 2.5 KB
- replication factor = 3
Estimate: Raw writes are about 75 GB/day. With RF=3, that becomes 225 GB/day before indexes and metadata, which can easily add another 30-60%.
Main difficulty: The expensive part here is the write path and long-term storage growth, not only current QPS.
How to spot the real difficulty quickly
CPU bound
High CPU utilization with moderate I/O usually points to heavy business logic, serialization, or encryption work.
Memory bound
Object churn, GC pauses, cache misses, and rising tail latency often point to a memory limit rather than a pure compute problem.
Storage bound
Queue depth growth, read/write IOPS saturation, and background maintenance dominating service time all point to storage pressure.
Network bound
Inter-service traffic starts consuming the latency budget, and every extra hop becomes more expensive than it first looked.
Common back-of-envelope anti-patterns
Using averages alone and ignoring real peaks or short bursts.
Collapsing read and write paths into one number and hiding the real bottleneck.
Sizing storage without indexes, replication, and retention horizon.
Drawing the architecture first and only then checking latency and capacity budgets.
Leaving no operational headroom, as if production always runs under ideal conditions.
A good estimation pass ends not just with numbers, but with a clear answer: what will limit the system first, and what architectural move that limit forces next.
Related chapters
- Design principles for scalable systems - provides the quality baseline and helps connect estimates to architectural limits.
- Load balancing - shows how peak RPS translates into practical traffic distribution and overload protection.
- Caching strategies - extends the discussion of hot working sets, hit ratio, and the real cost of a cache miss.
- Replication and sharding - explains how storage-growth estimates lead to replication, sharding, and rebalancing decisions.
- Event-Driven Architecture - adds queue and pipeline sizing once the system moves toward asynchronous flows.
- URL Shortener - shows how quick sizing can directly shape architecture choices.
- Notification System - gives a practical example of fan-out and its impact on infrastructure size.
- Rate Limiter - complements this chapter with burst control and critical-path protection.
