Architecture invented before estimating the order of magnitude is often elegant but irrelevant to the actual scale of the problem.
The chapter shows how back-of-the-envelope estimates help you identify what saturates first - traffic, memory, disk, network, or latency budget - and save time by filtering options through a rough capacity model early.
For system design interviews, this is one of the strongest signs of maturity: state sensible assumptions, identify the primary constraint, and only then propose an architecture that actually fits the scale you uncovered.
Practical value of this chapter
Estimate in minutes
Translate a product scenario into numbers fast: RPS/QPS, payload size, peak factor, and first-order infrastructure limits.
Primary bottleneck
Use RAM/storage/network/latency budgets to identify which layer saturates first and where architectural headroom is required.
Decision framing
Map estimates to pattern choices: caching, replication, sharding, asynchronous flows, and SLA/SLO constraints.
Interview execution
Present a confident structure in interviews: assumptions, formulas, risk framing, and the next evolution step.
Core reference
Designing Data-Intensive Applications
A strong anchor for throughput, storage, and latency trade-off thinking.
Back-of-envelope estimation is your first-5-minute tool for system design. It does not produce exact capacity numbers, but it quickly gives order-of-magnitude clarity, exposes hard limits, and highlights the highest-risk area.
The goal is to understand what fails first before choosing technologies: RPS, RAM working set, disk growth, network throughput, or critical-path latency budget.
Quick constants and units
Time
- 1 day = 86,400 s
- 1 month ≈ 30 days
- Peak is often 2-5x average
Traffic
- RPS = requests/day / 86,400
- RPS_peak = RPS_avg * peak factor
- Separate read and write paths
Bytes
- 1 KB = 10^3 B (rough)
- 1 MB = 10^6 B
- 1 GB = 10^9 B
Bandwidth
- Bytes/sec = RPS * payload
- bits/sec = bytes/sec * 8
- Include protocol overhead
Memory
- RAM = hot working set
- Add connection/session overhead
- Keep 20-30% headroom
Storage
- Storage/day = writes/day * row size
- Account for index + metadata
- Multiply by replication factor
5-step estimation template
Step 1
Lock your assumptions
User volume, key user journeys, read/write ratio, request/response size, and target SLA/SLO.
Step 2
Translate business metrics to load
Calculate average and peak RPS/QPS, include burst factor, and isolate the critical write path.
Step 3
Estimate resources per layer
RAM for hot data and cache, persistent storage growth horizon, and network throughput intra-service and outbound.
Step 4
Decompose latency budget
Split p95/p99 budget across edge, API, cache, DB, and external dependencies. Name the most expensive segment early.
Step 5
Find the first bottleneck
Determine what fails first under 5-10x growth: CPU, memory, network, storage IOPS, coordination latency, or operational complexity.
Traffic -> RPS/QPS formulas
Average and peak load
requests/day = DAU * requests_per_user/day
RPS_avg = requests/day / 86,400
RPS_peak = RPS_avg * peak_factor
Read/write split
RPS_read = RPS_total * read_ratio
RPS_write = RPS_total * write_ratio
write path usually drives storage and durability cost
RAM and persistent memory
RAM (working set)
Start from hot data needed to meet your latency target. Add cache metadata, connection/session overhead, and at least 20-30% headroom.
Persistent storage horizon
storage/day = writes/day * average_row_size
total/day = storage/day * replication_factor * (1 + index_overhead)
storage/N months = total/day * 30 * N
Network throughput and egress
For each service hop, calculate bytes/sec = RPS * payload. Convert to bits/sec and include protocol overhead (headers, TLS, retries, fan-out). This quickly shows where compression, batching, or API contract redesign is needed.
Latency budget decomposition
If the total budget is already close to the target SLO, tail latency and retry policy will likely be your primary risk, even if averages still look acceptable.
4 mini-cases with quick conclusions
Case 1: URL Shortener
- 100M redirects/day
- payload = 1.2KB response
- peak factor = 3x
Estimate: RPS_avg ≈ 1,160, RPS_peak ≈ 3,500. Egress_peak ≈ 3,500 * 1.2KB ≈ 4.2MB/s (~34Mbps) before TLS/header overhead.
Main complexity: Main complexity is usually not storage, but read amplification, cache hit ratio, and p99 lookup-path latency.
Case 2: Notification fan-out
- 5M events/day
- avg fan-out = 8
- delivery payload = 0.8KB
Estimate: Delivery ops/day = 40M, avg ops/sec ≈ 463. With 6x peak we get ~2.8K ops/sec across delivery channels.
Main complexity: Main challenge: backpressure, retry storms, and channel prioritization when downstream providers degrade.
Case 3: Timeline cache
- 10M DAU
- 20 timeline reads/user/day
- working set = 15% of users
Estimate: Reads/day = 200M, avg RPS ≈ 2,315. Hot users ≈ 1.5M. At 8KB cached item this is ~12GB hot set baseline (without overhead).
Main complexity: Key challenge: maintaining hit ratio and controlling invalidation without triggering cache stampede.
Case 4: Orders service (write-heavy peak)
- 30M orders/day
- row size = 2.5KB
- replication factor = 3
Estimate: Raw writes/day ≈ 75GB. With RF=3 this is 225GB/day + index/metadata overhead (often +30-60%).
Main complexity: Primary risk is write-path latency and long-term storage growth horizon, not only current QPS.
How to quickly identify the core system difficulty
CPU-bound
High utilization with moderate I/O: heavy business logic, serialization, encryption overhead.
Memory-bound
High churn, GC pauses, cache misses, and rising tail latency under tight RAM budget.
Storage-bound
Queue depth growth and read/write IOPS saturation; compaction/flush dominates service time.
Network-bound
Inter-service fan-out consumes latency budget; sensitivity to retransmits and packet loss increases.
Back-of-envelope anti-patterns
Relying only on averages and ignoring peak traffic and burst behavior.
Mixing read and write paths into one number, hiding actual bottlenecks.
Estimating storage without index overhead, replication factor, and retention horizon.
Drawing architecture first, then checking latency budget and capacity constraints.
Skipping operational headroom as if production always runs in ideal conditions.
A good estimation pass ends not just with numbers, but with an explicit answer: what limits the system first and how that limit scales.
Related chapters
- Design principles for scalable systems - provides the quality-attribute baseline and system trade-off framework.
- Load balancing - connects peak-RPS estimates with practical traffic distribution strategy.
- Caching strategies - extends RAM working-set sizing, hit ratio targets, and cache-miss cost analysis.
- Replication and sharding - covers storage growth, hot shards, and rebalancing implications after capacity estimates.
- Event-Driven Architecture - adds throughput and backlog sizing for asynchronous pipelines and queues.
- URL Shortener - an applied case where quick sizing directly drives architecture choices.
- Notification System - demonstrates fan-out estimation and delivery-semantics impact on infrastructure size.
- Rate Limiter - complements this chapter with burst control mechanics and critical-path protection.
