System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 10:34 PM

Design principles for scalable systems

mid

Scale Cube, CAP theorem, sharding, replication, caching, CQRS, persistence patterns.

Theory [RU]

Designing Data-Intensive Applications

The distributed systems bible by Martin Kleppmann

Read chapter [RU]

Scalability is the ability of a system to handle increasing load by adding resources. In this chapter, we review the fundamental patterns and principles behind designing high-load distributed systems.

Scale Cube: Three Scaling Axes

The Scale Cube model describes three dimensions of application scaling:

X-axis Scaling

Cloning

Run N identical copies of the application behind a load balancer. Each copy handles about 1/N of the load.

Simple to implement
Does not solve domain complexity

Y-axis Scaling

Decomposition

Split the app into microservices. Each service owns a specific function or domain.

Independent scaling
Coordination complexity

Z-axis Scaling

Sharding

Each server handles only a subset of data. Route by customer_id, region, and similar keys.

Failure isolation
Rebalancing complexity

Deep Dive [RU]

CAP Theorem

Formal definition, proof, and practical examples.

Read chapter [RU]

CAP Theorem

In a distributed system, you cannot guarantee all three properties at once. Under a network partition, you must choose between consistency and availability.

C

Consistency

All nodes see the same data at the same moment

A

Availability

Every request receives a response (success or error)

P

Partition Tolerance

The system keeps working when communication between nodes is broken

In practice: CP systems (HBase, MongoDB) sacrifice availability for consistency. AP systems (Cassandra, DynamoDB) sacrifice strict consistency for availability.

Deep Dive [RU]

PACELC Theorem

An extension of CAP for normal system operation.

Read chapter [RU]

PACELC Theorem

CAP extended: even without partitions (normal mode), you still choose between latency and consistency.

Partition -> Availability vs Consistency, Else -> Latency vs Consistency

PA/EL systems

Under partition they choose availability; in normal mode they choose low latency

Examples: Cassandra, DynamoDB, CockroachDB (tunable)

PC/EC systems

Under partition and in normal mode they choose consistency (strict consistency)

Examples: HBase, traditional RDBMS with synchronous replication

Practical meaning: PACELC helps explain trade-offs in normal operation. Synchronous replication improves consistency but increases latency. Asynchronous replication lowers latency but can return stale data.

Stateless Architecture

Stateless compute is the simplest scalability multiplier. When instances are replaceable, everything becomes easier:

Autoscaling works

You can add/remove instances safely

Safer deploys

Rolling updates without state loss

Fast recovery

A node failure is non-critical

Important: stateless does not mean "no in-memory optimizations". It means correctness does not depend on in-memory state. Sessions, workflow state, and critical data must live outside (Redis, DB).

Related chapter

Replication and Sharding

Detailed topology, rebalancing, and consistency trade-offs.

Read chapter

Database Scaling Patterns

Sharding (Partitioning)

Horizontal data split across multiple servers. Improves write scalability.

Range-based shardingby key ranges
Hash-based shardinguniform distribution
Directory-basedlookup service
Warning: cross-shard queries are a hard problem

Replication

Copy data across multiple servers. Improves read scalability and fault tolerance.

Leader-Followerwrites to leader, reads from replicas
Multi-Leaderwrites on multiple nodes
Leaderlessquorum reads/writes
Warning: replication lag is a source of inconsistency

Related chapter

Caching Strategies

Cache-Aside, Write-Through, Write-Back, and practical trade-offs.

Read chapter

Caching

Multi-level caching is one of the most effective ways to reduce latency and backend load.

CDN

CDN

~5ms

Cache

Redis/Memcached

~1-5ms

RAM

Local Cache

~0.1ms

DB

Database

~10-100ms

Cache-Aside (Lazy Loading)

The app reads from cache; on miss it goes to DB and then writes into cache

Write-Through

Write goes to cache first, then cache writes to DB

Write-Behind (Write-Back)

Cache buffers writes and periodically flushes them to DB

Refresh-Ahead

Proactive refresh of hot data before TTL expires

Latency vs Throughput

Two core dimensions of performance. Latency shapes user-perceived speed, throughput defines how much work the system can process.

Latency

Time from request to response. The key metric is p95/p99, not average.

  • Metrics: p50/p95/p99, tail latency
  • Failure modes: queues, synchronous locks, network hops

Throughput

Operations per second (RPS/QPS, events/sec). A key KPI for batch and background workloads.

  • Metrics: RPS, ops/sec, bytes/sec
  • Failure modes: blocking I/O, insufficient parallelism

Latency first

Caching, pre-computation, short critical path, parallelism.

Throughput first

Batching, queues, bulk I/O, async processing, horizontal scaling.

Trade-off

At high utilization, latency grows non-linearly. Control queue depth and use backpressure.

Rule of thumb: for interactive UX, minimize latency; for background pipelines, optimize throughput.

Related chapter

Async Processing and Event-Driven Architecture

How to design queues, events, and delivery semantics.

Read chapter

Asynchronous Processing

Message Queues

Decouple heavy processing from the primary request path.

Producer
Queue
Consumer
Kafka
high-throughput, persistent
RabbitMQ
routing, priorities
SQS
managed, serverless

Event-Driven Architecture

Services communicate through events. Loose coupling and independent scaling.

OrderCreated ->
InventoryService
PaymentService
NotificationService
Kafka
event streaming, durable log
Pulsar
multi-tenant, geo-replication
NATS
low-latency pub/sub
EventBridge
managed event bus (AWS)
Pub/Sub
managed events (GCP/Azure)

CQRS (Command Query Responsibility Segregation)

Split read and write models to optimize each path independently.

Write Model (Commands)

Optimized for transactions and validation. Normalized data.

Read Model (Queries)

Optimized for fast reads. Denormalized and pre-aggregated data.

Useful when read/write workloads
differ significantly in volume

Related chapter

Resilience Patterns

Circuit Breaker, Bulkhead, Retry, and graceful degradation practices.

Read chapter

Resilience Patterns

Circuit Breaker

Prevents cascading failures. Once error threshold is reached, it cuts off calls to the unhealthy service.

Closed
->
Open
->
Half-Open

Backpressure

The ability to reject excess load before overload causes a failure.

  • Bounded queues
  • Admission control (HTTP 429)
  • Degraded responses

Retries + Idempotency

Retries are unavoidable in distributed systems. Idempotency guarantees repeated requests stay safe.

Idempotency-Key: abc-123-def

Bulkhead

Resource isolation to stop failures from propagating. Separate thread/connection pools for critical functions.

Like watertight compartments in a ship.

Foundations [RU]

HTTP Protocol

L7 load balancing depends on HTTP and routing rules.

Read chapter [RU]

Related chapter

Traffic Load Balancing

Round Robin, Least Connections, and Consistent Hashing in practice.

Read chapter

Load Balancing

Algorithms

Round Robinsequential
Weighted RRcapacity-aware
Least Connectionsroute to least-loaded
IP Hashsticky sessions

Layers

L4 (Transport)

TCP/UDP, fast, simple

L7 (Application)

HTTP, URL/header-based routing

DNS-based

geo-routing, failover

Pattern Summary Table

PatternAdvantageTrade-off
Horizontal ScalingElastic growthCoordination
ShardingWrite scalabilityOperational complexity
ReplicationRead scalabilityConsistency lag
CachingLow latencyStale data
Async processingHigh throughputDelayed results
Circuit BreakerFault isolationState management
CQRSOptimized pathsConsistency complexity

Key Takeaways

1.

Stateless first - design compute components without state to simplify scaling

2.

CAP awareness - understand consistency vs availability trade-offs in your system

3.

Cache aggressively - multi-level caching dramatically reduces latency

4.

Async by default - decouple heavy processing from the user critical path

5.

Design for failure - circuit breakers, retries, and bulkheads protect from cascading failures

6.

Idempotency always - every operation should be safe to execute repeatedly

Related materials

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov