Replication and sharding are easy to confuse precisely because both sound like 'data scaling,' while the cost of getting them wrong is very different.
The chapter separates read scale and availability from write scale, showing when primary-replica is enough, when multi-primary or leaderless topology becomes necessary, how shard keys should be chosen, and why re-sharding is almost always an expensive project.
In design reviews, this is especially useful because it helps you discuss lag, failover, hot shards, rebalancing cost, and evolving access patterns as parts of one decision rather than as a pile of isolated infrastructure tricks.
Practical value of this chapter
Data boundary
Choose partition strategy from access patterns to reduce hot shards and expensive cross-shard operations.
Replication mode
Select sync vs async replication from RPO/RTO goals and read-your-writes needs on critical paths.
Rebalance without pain
Treat re-sharding as a standard operation: dual writes, backfill, and controlled traffic cutover.
Interview rigor
Explain where the design reaches limits and how you would evolve the scheme for 10x growth.
Context
Principles of Designing Scalable Systems
This chapter deep-dives into one of the core scaling topics: replication + sharding.
Replication and sharding define data scale, availability, and cost profile. Replication improves resilience and read scaling; sharding increases write throughput and total capacity. Mistakes here usually surface not at launch, but during traffic growth and product complexity expansion.
Replication Models
Primary-Replica
One write leader with multiple read replicas. Works well for read-heavy workloads and straightforward operations.
Replica lag and read/write asymmetry; failover must be automated and tested.
Multi-Primary
Writes are accepted in multiple regions/nodes, reducing latency for geo-distributed writes.
Conflict resolution and merge logic are hard; significantly higher platform complexity.
Leaderless
Quorum-based reads/writes with high availability for distributed KV/NoSQL scenarios.
Harder consistency reasoning, read repair, hinted handoff, and tombstone behavior.
Replication Visualization
Replication Simulator
Request Queue
Replication Router
Primary-Replica
Primary
Replica A
Replica B
Ready for simulation. Start auto mode or run a single step.
Last decision: —
Sharding Models
- Hash-based sharding: even distribution, but range queries are harder.
- Range-based sharding: efficient for range queries, but hot-range risk is high.
- Directory-based sharding: flexible data placement, requires a reliable metadata service.
- Geo/tenant sharding: strong sovereignty and tenant isolation, but cross-tenant workflows are harder.
Sharding Visualization
Sharding Simulator
Shard Keys Queue
Shard Router
Hash Sharding
Shard 0
Shard 1
Shard 2
Ready for simulation. Start auto mode or run a single step.
Last decision: —
Related
Performance Engineering
Hot shards and migration windows directly affect p95/p99 latency.
Shard Rebalancing and Hot-Shard Control
Use consistent hashing and virtual nodes to reduce migration volume when node count changes.
Mitigate hot shards with adaptive partitioning, write buffering, workload-key routing, and temporal split.
Run backfill and migrations through throttled pipelines with checksum verification.
Every migration operation must support pause/resume, rollback plan, and observable progress.
Cloud Native
Cost Optimization & FinOps
Replication topology and rebalancing directly affect cloud cost and unit economics.
Practical Checklist
RPO/RTO are defined per data class and failure scenario.
Operations requiring strong consistency vs eventual consistency are explicitly documented.
There is a shard-key evolution plan for product growth and changing access patterns.
Replication, failover, and rebalancing are validated regularly through game days.
Cross-zone/cross-region replication cost is accounted for in the FinOps model.
Common anti-pattern: picking shard key for the current feature set without an evolution plan.
References
Related chapters
- Principles of Designing Scalable Systems - The scaling framework where replication and sharding decisions are made.
- CAP Theorem - Core distributed-system constraints under network partitions.
- PACELC Theorem - Latency vs consistency trade-offs even without partitions.
- Database Selection Framework - How database type limits or enables replication and sharding options.
- Multi-region / Global Systems - Replication and sharding under global geography and regulatory constraints.
