System Design Space
Knowledge graphSettings

Updated: March 24, 2026 at 5:36 PM

Back-of-Envelope Estimation

medium

A practical framework for quick sizing: traffic, RPS/QPS, RAM, persistent storage, latency budgets, and identifying the first bottleneck.

Architecture invented before estimating the order of magnitude is often elegant but irrelevant to the actual scale of the problem.

The chapter shows how back-of-the-envelope estimates help you identify what saturates first - traffic, memory, disk, network, or latency budget - and save time by filtering options through a rough capacity model early.

For system design interviews, this is one of the strongest signs of maturity: state sensible assumptions, identify the primary constraint, and only then propose an architecture that actually fits the scale you uncovered.

Practical value of this chapter

Estimate in minutes

Translate a product scenario into numbers fast: RPS/QPS, payload size, peak factor, and first-order infrastructure limits.

Primary bottleneck

Use RAM/storage/network/latency budgets to identify which layer saturates first and where architectural headroom is required.

Decision framing

Map estimates to pattern choices: caching, replication, sharding, asynchronous flows, and SLA/SLO constraints.

Interview execution

Present a confident structure in interviews: assumptions, formulas, risk framing, and the next evolution step.

Core reference

Designing Data-Intensive Applications

A strong anchor for throughput, storage, and latency trade-off thinking.

Читать обзор

Back-of-envelope estimation is your first-5-minute tool for system design. It does not produce exact capacity numbers, but it quickly gives order-of-magnitude clarity, exposes hard limits, and highlights the highest-risk area.

The goal is to understand what fails first before choosing technologies: RPS, RAM working set, disk growth, network throughput, or critical-path latency budget.

Quick constants and units

Time

  • 1 day = 86,400 s
  • 1 month ≈ 30 days
  • Peak is often 2-5x average

Traffic

  • RPS = requests/day / 86,400
  • RPS_peak = RPS_avg * peak factor
  • Separate read and write paths

Bytes

  • 1 KB = 10^3 B (rough)
  • 1 MB = 10^6 B
  • 1 GB = 10^9 B

Bandwidth

  • Bytes/sec = RPS * payload
  • bits/sec = bytes/sec * 8
  • Include protocol overhead

Memory

  • RAM = hot working set
  • Add connection/session overhead
  • Keep 20-30% headroom

Storage

  • Storage/day = writes/day * row size
  • Account for index + metadata
  • Multiply by replication factor

5-step estimation template

1

Step 1

Lock your assumptions

User volume, key user journeys, read/write ratio, request/response size, and target SLA/SLO.

2

Step 2

Translate business metrics to load

Calculate average and peak RPS/QPS, include burst factor, and isolate the critical write path.

3

Step 3

Estimate resources per layer

RAM for hot data and cache, persistent storage growth horizon, and network throughput intra-service and outbound.

4

Step 4

Decompose latency budget

Split p95/p99 budget across edge, API, cache, DB, and external dependencies. Name the most expensive segment early.

5

Step 5

Find the first bottleneck

Determine what fails first under 5-10x growth: CPU, memory, network, storage IOPS, coordination latency, or operational complexity.

Traffic -> RPS/QPS formulas

Average and peak load

requests/day = DAU * requests_per_user/day
RPS_avg = requests/day / 86,400
RPS_peak = RPS_avg * peak_factor

Read/write split

RPS_read = RPS_total * read_ratio
RPS_write = RPS_total * write_ratio
write path usually drives storage and durability cost

RAM and persistent memory

RAM (working set)

Start from hot data needed to meet your latency target. Add cache metadata, connection/session overhead, and at least 20-30% headroom.

Persistent storage horizon

storage/day = writes/day * average_row_size
total/day = storage/day * replication_factor * (1 + index_overhead)
storage/N months = total/day * 30 * N

Network throughput and egress

For each service hop, calculate bytes/sec = RPS * payload. Convert to bits/sec and include protocol overhead (headers, TLS, retries, fan-out). This quickly shows where compression, batching, or API contract redesign is needed.

Latency budget decomposition

Layer
p95 budget
Notes
Edge / LB
10 ms
TLS + routing
API service
25 ms
business logic
Cache
8 ms
hot read
DB primary
35 ms
critical query
External dependency
20 ms
payment/risk/etc

If the total budget is already close to the target SLO, tail latency and retry policy will likely be your primary risk, even if averages still look acceptable.

4 mini-cases with quick conclusions

Case 1: URL Shortener

  • 100M redirects/day
  • payload = 1.2KB response
  • peak factor = 3x

Estimate: RPS_avg ≈ 1,160, RPS_peak ≈ 3,500. Egress_peak ≈ 3,500 * 1.2KB ≈ 4.2MB/s (~34Mbps) before TLS/header overhead.

Main complexity: Main complexity is usually not storage, but read amplification, cache hit ratio, and p99 lookup-path latency.

Case 2: Notification fan-out

  • 5M events/day
  • avg fan-out = 8
  • delivery payload = 0.8KB

Estimate: Delivery ops/day = 40M, avg ops/sec ≈ 463. With 6x peak we get ~2.8K ops/sec across delivery channels.

Main complexity: Main challenge: backpressure, retry storms, and channel prioritization when downstream providers degrade.

Case 3: Timeline cache

  • 10M DAU
  • 20 timeline reads/user/day
  • working set = 15% of users

Estimate: Reads/day = 200M, avg RPS ≈ 2,315. Hot users ≈ 1.5M. At 8KB cached item this is ~12GB hot set baseline (without overhead).

Main complexity: Key challenge: maintaining hit ratio and controlling invalidation without triggering cache stampede.

Case 4: Orders service (write-heavy peak)

  • 30M orders/day
  • row size = 2.5KB
  • replication factor = 3

Estimate: Raw writes/day ≈ 75GB. With RF=3 this is 225GB/day + index/metadata overhead (often +30-60%).

Main complexity: Primary risk is write-path latency and long-term storage growth horizon, not only current QPS.

How to quickly identify the core system difficulty

CPU-bound

High utilization with moderate I/O: heavy business logic, serialization, encryption overhead.

Memory-bound

High churn, GC pauses, cache misses, and rising tail latency under tight RAM budget.

Storage-bound

Queue depth growth and read/write IOPS saturation; compaction/flush dominates service time.

Network-bound

Inter-service fan-out consumes latency budget; sensitivity to retransmits and packet loss increases.

Back-of-envelope anti-patterns

Relying only on averages and ignoring peak traffic and burst behavior.

Mixing read and write paths into one number, hiding actual bottlenecks.

Estimating storage without index overhead, replication factor, and retention horizon.

Drawing architecture first, then checking latency budget and capacity constraints.

Skipping operational headroom as if production always runs in ideal conditions.

A good estimation pass ends not just with numbers, but with an explicit answer: what limits the system first and how that limit scales.

Related chapters

Enable tracking in Settings