A balancing algorithm should be chosen for the workload it protects, not for the popularity of its name.
The chapter compares Round Robin, Least Connections, and Consistent Hashing through request duration, load skew, state affinity, hot keys, and the cost of redistributing traffic when the instance pool changes.
For system design interviews, this is especially useful because it shifts the discussion from 'which algorithm should we use' to 'which traffic shape are we protecting and what breaks if we choose wrong.'
Practical value of this chapter
Algorithm by workload
Match RR/LC/Hash strategies to workload shape, session duration, and affinity requirements.
Cost of choice
Call out trade-off costs: uneven load, hot spots, rebalance complexity, and behavior under node failures.
Observability
Define fairness metrics and early warning signals that indicate algorithm tuning or replacement is needed.
Interview trade-offs
Show how algorithm choice shifts when moving from stateless APIs to stateful or realtime scenarios.
Reference
Envoy Load Balancing
Practical guide to balancing algorithms and real-world usage scenarios.
A balancing algorithm directly affects latency, resilience, and resource efficiency. The same instance pool can behave very differently depending on whether you choose equal rotation, current-load adaptation, or key-based routing.
Fairness
Round Robin provides a clean baseline if nodes are close in capacity.
Load Adaptation
Least Connections reacts better when short and long requests are mixed.
Key Locality
Consistent Hashing reduces cache misses and key remap during pool changes.
Resilience
Combine the algorithm with health checks, slow-start, and draining in every rollout.
Algorithm Selection Playbook
Profile the traffic first
Step 1Capture request duration, burst behavior, and the share of stateful operations.
Map algorithm to workload
Step 2Round Robin for simple pools, Least Connections for mixed latency, Consistent Hashing for key locality.
Test degradation paths
Step 3Simulate shutdown/degraded nodes and evaluate p95/p99, retry storms, and key remap effects.
Enforce operational guardrails
Step 4Apply health policy, slow-start, draining, and alerts on saturation/hot keys.
Core Algorithms and Visualization
This visualization shows how the same request stream is distributed differently by each algorithm.
Round Robin
Requests are distributed in a cycle: S1 -> S2 -> S3 -> S1.
Pros
- Very simple implementation and predictable behavior.
- Works well for stateless backends with homogeneous nodes.
- Low runtime overhead in the balancer.
Limitations
- Does not account for current load or slow nodes.
- Can hurt latency on heterogeneous server pools.
Request Queue
Load Balancer
Round Robin
Server A
Server B
Server C
Ready for simulation. Start auto mode or run a single step.
Last decision: —
Comparison and Trade-offs
Pick the algorithm based on workload profile and state/locality constraints, not on popularity.
| Algorithm | State awareness | Failover behavior | Balancing quality | Complexity | Best fit |
|---|---|---|---|---|---|
| Round Robin | No | Simple node exclusion | Medium | Low | Stateless, homogeneous pool |
| Least Connections | Active connections | Works well for burst + long-lived connections | High | Medium | Mixed latency workload |
| Consistent Hashing | Key-aware routing | Partial remap to neighboring nodes | Depends on virtual nodes | High | Stateful/cache-sensitive traffic |
Selection in Practice
Quick Rules
- Start with Round Robin for simple stateless APIs.
- Move to Least Connections for long or uneven request duration.
- Use Consistent Hashing when locality and sticky state are important.
- Validate decisions on peak traffic profile, not only average load.
Common Mistakes
- Choosing an algorithm without profiling traffic shape (request duration, burst, key skew).
- Using Consistent Hashing without virtual nodes and hot-key monitoring.
- Ignoring health checks and slow-start for new instances.
- Treating active connections as the only load metric without CPU/RPS/p99 latency.
- Mixing sticky and non-sticky routing without an explicit fallback policy.
Mini Checklist Before Production
Related chapters
- Traffic load balancing - provides the L4/L7, health-check, and GSLB architecture context on top of which concrete algorithms are chosen.
- Design principles for scalable systems - explains how traffic distribution decisions connect to latency, throughput, and broader scalability trade-offs.
- Service Discovery - shows how to keep target instance pools up to date, which is required for correct balancing behavior.
- Service Mesh Architecture - extends balancing algorithms to service-to-service traffic with retries and outlier-detection policies.
- Caching strategies - helps reason about key locality and hot-key effects that are critical for consistent hashing.
- Multi-region / Global Systems - adds the regional traffic-distribution layer and failover strategy across data centers.
- API Gateway (routing and balancing) - demonstrates an applied L7 scenario where balancing algorithms work alongside routing policies.
