System Design Space
Knowledge graphSettings

Updated: March 24, 2026 at 5:36 PM

Balancing algorithms: Round Robin, Least Connections, Consistent Hashing

medium

Practical analysis of popular balancing algorithms, their trade-offs and recommendations for choosing for stateless/stateful loads.

A balancing algorithm should be chosen for the workload it protects, not for the popularity of its name.

The chapter compares Round Robin, Least Connections, and Consistent Hashing through request duration, load skew, state affinity, hot keys, and the cost of redistributing traffic when the instance pool changes.

For system design interviews, this is especially useful because it shifts the discussion from 'which algorithm should we use' to 'which traffic shape are we protecting and what breaks if we choose wrong.'

Practical value of this chapter

Algorithm by workload

Match RR/LC/Hash strategies to workload shape, session duration, and affinity requirements.

Cost of choice

Call out trade-off costs: uneven load, hot spots, rebalance complexity, and behavior under node failures.

Observability

Define fairness metrics and early warning signals that indicate algorithm tuning or replacement is needed.

Interview trade-offs

Show how algorithm choice shifts when moving from stateless APIs to stateful or realtime scenarios.

Reference

Envoy Load Balancing

Practical guide to balancing algorithms and real-world usage scenarios.

Open reference

A balancing algorithm directly affects latency, resilience, and resource efficiency. The same instance pool can behave very differently depending on whether you choose equal rotation, current-load adaptation, or key-based routing.

Fairness

Round Robin provides a clean baseline if nodes are close in capacity.

Load Adaptation

Least Connections reacts better when short and long requests are mixed.

Key Locality

Consistent Hashing reduces cache misses and key remap during pool changes.

Resilience

Combine the algorithm with health checks, slow-start, and draining in every rollout.

Algorithm Selection Playbook

1

Profile the traffic first

Step 1

Capture request duration, burst behavior, and the share of stateful operations.

2

Map algorithm to workload

Step 2

Round Robin for simple pools, Least Connections for mixed latency, Consistent Hashing for key locality.

3

Test degradation paths

Step 3

Simulate shutdown/degraded nodes and evaluate p95/p99, retry storms, and key remap effects.

4

Enforce operational guardrails

Step 4

Apply health policy, slow-start, draining, and alerts on saturation/hot keys.

Core Algorithms and Visualization

This visualization shows how the same request stream is distributed differently by each algorithm.

Round Robin

Requests are distributed in a cycle: S1 -> S2 -> S3 -> S1.

Pros

  • Very simple implementation and predictable behavior.
  • Works well for stateless backends with homogeneous nodes.
  • Low runtime overhead in the balancer.

Limitations

  • Does not account for current load or slow nodes.
  • Can hurt latency on heterogeneous server pools.
Best fit: Stateless APIs, even traffic, and similar instance capacity.

Request Queue

REQ-101Web
user:42
REQ-102Mobile
user:77
REQ-103Partner
tenant:acme
REQ-104Web
user:42

Load Balancer

Round Robin

Each new request is sent to the next server in a circular order.

Server A

handled: 0
active connections: 0

Server B

handled: 0
active connections: 0

Server C

handled: 0
active connections: 0
Ready

Ready for simulation. Start auto mode or run a single step.

Last decision: —

Comparison and Trade-offs

Pick the algorithm based on workload profile and state/locality constraints, not on popularity.

AlgorithmState awarenessFailover behaviorBalancing qualityComplexityBest fit
Round RobinNoSimple node exclusionMediumLowStateless, homogeneous pool
Least ConnectionsActive connectionsWorks well for burst + long-lived connectionsHighMediumMixed latency workload
Consistent HashingKey-aware routingPartial remap to neighboring nodesDepends on virtual nodesHighStateful/cache-sensitive traffic

Selection in Practice

Quick Rules

  • Start with Round Robin for simple stateless APIs.
  • Move to Least Connections for long or uneven request duration.
  • Use Consistent Hashing when locality and sticky state are important.
  • Validate decisions on peak traffic profile, not only average load.

Common Mistakes

  • Choosing an algorithm without profiling traffic shape (request duration, burst, key skew).
  • Using Consistent Hashing without virtual nodes and hot-key monitoring.
  • Ignoring health checks and slow-start for new instances.
  • Treating active connections as the only load metric without CPU/RPS/p99 latency.
  • Mixing sticky and non-sticky routing without an explicit fallback policy.

Mini Checklist Before Production

1Active/passive health checks are enabled.
2Slow-start and graceful connection draining are configured.
3Metrics include p95/p99, backend saturation, and retry-storm rate.
4Failover is tested for both hard shutdown and node degradation.

Related chapters

Enable tracking in Settings