Theory [RU]
Designing Data-Intensive Applications
The distributed systems bible by Martin Kleppmann
Scalability is the ability of a system to handle increasing load by adding resources. In this chapter, we review the fundamental patterns and principles behind designing high-load distributed systems.
Scale Cube: Three Scaling Axes
The Scale Cube model describes three dimensions of application scaling:
X-axis Scaling
Run N identical copies of the application behind a load balancer. Each copy handles about 1/N of the load.
Y-axis Scaling
Split the app into microservices. Each service owns a specific function or domain.
Z-axis Scaling
Each server handles only a subset of data. Route by customer_id, region, and similar keys.
Deep Dive [RU]
CAP Theorem
Formal definition, proof, and practical examples.
CAP Theorem
In a distributed system, you cannot guarantee all three properties at once. Under a network partition, you must choose between consistency and availability.
Consistency
All nodes see the same data at the same moment
Availability
Every request receives a response (success or error)
Partition Tolerance
The system keeps working when communication between nodes is broken
In practice: CP systems (HBase, MongoDB) sacrifice availability for consistency. AP systems (Cassandra, DynamoDB) sacrifice strict consistency for availability.
Deep Dive [RU]
PACELC Theorem
An extension of CAP for normal system operation.
PACELC Theorem
CAP extended: even without partitions (normal mode), you still choose between latency and consistency.
Partition -> Availability vs Consistency, Else -> Latency vs Consistency
PA/EL systems
Under partition they choose availability; in normal mode they choose low latency
PC/EC systems
Under partition and in normal mode they choose consistency (strict consistency)
Practical meaning: PACELC helps explain trade-offs in normal operation. Synchronous replication improves consistency but increases latency. Asynchronous replication lowers latency but can return stale data.
Stateless Architecture
Stateless compute is the simplest scalability multiplier. When instances are replaceable, everything becomes easier:
Autoscaling works
You can add/remove instances safely
Safer deploys
Rolling updates without state loss
Fast recovery
A node failure is non-critical
Important: stateless does not mean "no in-memory optimizations". It means correctness does not depend on in-memory state. Sessions, workflow state, and critical data must live outside (Redis, DB).
Related chapter
Replication and Sharding
Detailed topology, rebalancing, and consistency trade-offs.
Database Scaling Patterns
Sharding (Partitioning)
Horizontal data split across multiple servers. Improves write scalability.
Replication
Copy data across multiple servers. Improves read scalability and fault tolerance.
Related chapter
Caching Strategies
Cache-Aside, Write-Through, Write-Back, and practical trade-offs.
Caching
Multi-level caching is one of the most effective ways to reduce latency and backend load.
CDN
~5ms
Redis/Memcached
~1-5ms
Local Cache
~0.1ms
Database
~10-100ms
Cache-Aside (Lazy Loading)
The app reads from cache; on miss it goes to DB and then writes into cache
Write-Through
Write goes to cache first, then cache writes to DB
Write-Behind (Write-Back)
Cache buffers writes and periodically flushes them to DB
Refresh-Ahead
Proactive refresh of hot data before TTL expires
Latency vs Throughput
Two core dimensions of performance. Latency shapes user-perceived speed, throughput defines how much work the system can process.
Latency
Time from request to response. The key metric is p95/p99, not average.
- Metrics: p50/p95/p99, tail latency
- Failure modes: queues, synchronous locks, network hops
Throughput
Operations per second (RPS/QPS, events/sec). A key KPI for batch and background workloads.
- Metrics: RPS, ops/sec, bytes/sec
- Failure modes: blocking I/O, insufficient parallelism
Latency first
Caching, pre-computation, short critical path, parallelism.
Throughput first
Batching, queues, bulk I/O, async processing, horizontal scaling.
Trade-off
At high utilization, latency grows non-linearly. Control queue depth and use backpressure.
Rule of thumb: for interactive UX, minimize latency; for background pipelines, optimize throughput.
Related chapter
Async Processing and Event-Driven Architecture
How to design queues, events, and delivery semantics.
Asynchronous Processing
Message Queues
Decouple heavy processing from the primary request path.
Event-Driven Architecture
Services communicate through events. Loose coupling and independent scaling.
CQRS (Command Query Responsibility Segregation)
Split read and write models to optimize each path independently.
Write Model (Commands)
Optimized for transactions and validation. Normalized data.
Read Model (Queries)
Optimized for fast reads. Denormalized and pre-aggregated data.
Useful when read/write workloads
differ significantly in volume
Related chapter
Resilience Patterns
Circuit Breaker, Bulkhead, Retry, and graceful degradation practices.
Resilience Patterns
Circuit Breaker
Prevents cascading failures. Once error threshold is reached, it cuts off calls to the unhealthy service.
Backpressure
The ability to reject excess load before overload causes a failure.
- Bounded queues
- Admission control (HTTP 429)
- Degraded responses
Retries + Idempotency
Retries are unavoidable in distributed systems. Idempotency guarantees repeated requests stay safe.
Bulkhead
Resource isolation to stop failures from propagating. Separate thread/connection pools for critical functions.
Like watertight compartments in a ship.
Foundations [RU]
HTTP Protocol
L7 load balancing depends on HTTP and routing rules.
Related chapter
Traffic Load Balancing
Round Robin, Least Connections, and Consistent Hashing in practice.
Load Balancing
Algorithms
Layers
TCP/UDP, fast, simple
HTTP, URL/header-based routing
geo-routing, failover
Pattern Summary Table
| Pattern | Advantage | Trade-off |
|---|---|---|
| Horizontal Scaling | Elastic growth | Coordination |
| Sharding | Write scalability | Operational complexity |
| Replication | Read scalability | Consistency lag |
| Caching | Low latency | Stale data |
| Async processing | High throughput | Delayed results |
| Circuit Breaker | Fault isolation | State management |
| CQRS | Optimized paths | Consistency complexity |
Key Takeaways
Stateless first - design compute components without state to simplify scaling
CAP awareness - understand consistency vs availability trade-offs in your system
Cache aggressively - multi-level caching dramatically reduces latency
Async by default - decouple heavy processing from the user critical path
Design for failure - circuit breakers, retries, and bulkheads protect from cascading failures
Idempotency always - every operation should be safe to execute repeatedly
