System Design Space
Knowledge graphSettings

Updated: May 7, 2026 at 6:26 PM

Load Balancing Algorithms: Round Robin, Least Connections, Consistent Hashing

medium

Practical guide to Round Robin, Least Connections, and Consistent Hashing: how to choose an algorithm by workload profile, key locality, and degradation behavior.

A balancing algorithm is easy to treat as a small implementation detail, even though it often decides how evenly the system handles long requests, hot keys, and pool changes.

The chapter compares Round Robin, Least Connections, and Consistent Hashing through request duration, uneven load, key locality, and the cost of redistribution during degradation.

In interviews, that helps shift the conversation from naming an algorithm to explaining which traffic shape you are protecting and how you would notice the choice has stopped working.

Practical value of this chapter

Traffic Profile

Start with the shape of the traffic: even flow, long requests, hot keys, and the share of stateful work.

Uneven Load

Look at what happens under long requests, spikes, and degraded nodes, not only during calm steady state.

Key Locality

Understand the price of locality: fewer cache misses, but more risk of hot keys and skewed distribution.

Decision Rationale

Explain which traffic profile the algorithm protects and which signals tell you it needs to be revisited.

Reference

Envoy Load Balancing

Practical guide to balancing algorithms and real-world usage scenarios.

Open reference

A load-balancing algorithm is not a tiny implementation detail. It shapes latency, resilience, and how evenly the pool behaves under real traffic.

Round Robin, Least Connections, and Consistent Hashing solve the same routing problem in different ways: one gives a clean fairness baseline, one adapts to current pressure, and one preserves key-to-node locality.

The real choice depends on workload shape, state affinity, fairness expectations, the share of stateless versus stateful traffic, and the operational cost of pool changes.

Fairness

Round Robin gives a clean fairness baseline when nodes are similar in capacity.

Load Adaptation

Least Connections adapts better when some requests finish quickly and others stay open much longer.

Key Locality

Consistent Hashing reduces cache misses and key migration when the pool changes.

Resilience

An algorithm only works in production when health checks, slow start, and safe draining are part of the picture.

How to Choose an Algorithm

1

Describe the traffic profile

Step 1

Capture request duration, burst behavior, key distribution, and the share of stateful operations.

2

Match the algorithm to traffic behavior

Step 2

Round Robin fits even pools, Least Connections fits uneven request duration, and Consistent Hashing fits key locality and state affinity.

3

Test degradation paths

Step 3

Simulate node shutdown, overload, and retries, then evaluate p95/p99 and the cost of key redistribution.

4

Add operational guardrails

Step 4

Configure health policy, slow start, draining, and alerts for saturation and hot keys.

Core Algorithms and Visual Walkthrough

This visualization shows how the same request stream is distributed differently by each algorithm.

Round Robin

Requests are distributed in a cycle: S1 -> S2 -> S3 -> S1.

Pros

  • Very simple implementation and predictable behavior.
  • Works well for stateless backends with homogeneous nodes.
  • Low runtime overhead in the balancer.

Limitations

  • Does not account for current load or slow nodes.
  • Can hurt latency on heterogeneous server pools.
Best fit: Stateless APIs, even traffic, and similar instance capacity.

Request Queue

REQ-101Web
user:42
REQ-102Mobile
user:77
REQ-103Partner
tenant:acme
REQ-104Web
user:42

Load Balancer

Round Robin

Each new request is sent to the next server in a circular order.

Server A

handled: 0
active connections: 0

Server B

handled: 0
active connections: 0

Server C

handled: 0
active connections: 0
Ready

Ready for simulation. Start auto mode or run a single step.

Last decision: —

Comparing the Algorithms

Compare the algorithms by how they behave under overload, pool churn, and uneven key distribution. For Consistent Hashing, virtual nodes, hot keys, and key skew usually matter more than the headline description of the algorithm itself.

AlgorithmWhat It ObservesBehavior During DegradationDistribution QualityComplexityBest fit
Round RobinAlmost no awareness of current node stateRemoves a failed node with minimal extra logicSolid when traffic is even and nodes are similarLowHomogeneous pools of stateless services
Least ConnectionsTracks the number of active connectionsUsually handles long requests and traffic spikes betterHigher when request duration varies a lotMediumTraffic with mixed short and long requests
Consistent HashingRoutes deterministically by request keyReassigns only part of the keys when the pool changesDepends on virtual-node setup and key shapeHighTraffic with state affinity and strong key-locality needs

Practical Guidance

Quick Rules

  • Start with Round Robin when the pool is homogeneous and request duration is roughly even.
  • Move to Least Connections when short and long requests are mixed in the same flow.
  • Use Consistent Hashing when keeping the same key on the same node matters more than perfect evenness.
  • Validate the choice on peak load and degraded-node scenarios, not only on average traffic.

Common Mistakes

  • Choosing an algorithm without profiling request duration, burst shape, and key skew.
  • Using Consistent Hashing without virtual nodes or hot-key monitoring.
  • Ignoring health checks, slow start, and safe connection draining for new instances.
  • Treating active connections as the only load metric while ignoring CPU, RPS, and p99 latency.
  • Mixing sticky and non-sticky routing without a clear fallback policy.

Checks Before Production

1Active and passive health checks are enabled.
2Slow start and graceful connection draining are configured.
3Metrics cover p95/p99, node saturation, skew, and retry-storm rate.
4The algorithm is tested for both node shutdown and degraded-node scenarios.

Related chapters

  • Load Balancing - provides the L4/L7, health-management, and global-routing context on top of which concrete algorithms are chosen.
  • Design principles for scalable systems - explains how traffic-distribution decisions connect to latency, throughput, and the broader system-growth model.
  • Service Discovery - shows how to keep target pools up to date, which is required for correct balancing behavior.
  • Service Mesh Architecture - extends balancing algorithms to service-to-service traffic with retries and unhealthy-instance policies.
  • Caching strategies - helps reason about key locality and hot-key pressure, which are critical for Consistent Hashing.
  • Multi-region / Global Systems - adds the regional traffic-distribution layer and failover strategy across data centers.
  • API Gateway - demonstrates an applied L7 scenario where balancing algorithms work alongside routing policies.

Enable tracking in Settings