Load Balancing Algorithms: Round Robin, Least Connections, Consistent Hashing

A balancing algorithm is easy to treat as a small implementation detail, even though it often decides how evenly the system handles long requests, hot keys, and pool changes.

The chapter compares Round Robin, Least Connections, and Consistent Hashing through request duration, uneven load, key locality, and the cost of redistribution during degradation.

In interviews, that helps shift the conversation from naming an algorithm to explaining which traffic shape you are protecting and how you would notice the choice has stopped working.

Practical value of this chapter

Traffic Profile

Start with the shape of the traffic: even flow, long requests, hot keys, and the share of stateful work.

Uneven Load

Look at what happens under long requests, spikes, and degraded nodes, not only during calm steady state.

Key Locality

Understand the price of locality: fewer cache misses, but more risk of hot keys and skewed distribution.

Decision Rationale

Explain which traffic profile the algorithm protects and which signals tell you it needs to be revisited.

Reference

Envoy Load Balancing

Practical guide to balancing algorithms and real-world usage scenarios.

Open reference

Picking a load-balancing algorithm looks like a tiny setting right up until the first load skew. In practice it sets latency, how resilient the pool is, and whether nodes hold real traffic evenly under peak.

Round Robin, Least Connections, and Consistent Hashing solve the same routing problem in different ways: one gives a clean fairness baseline, one adapts to current pressure, and one preserves key-to-node locality.

The real choice depends on workload shape, state affinity, fairness expectations, the share of stateless versus stateful traffic, and the operational cost of pool changes.

Fairness

Round Robin gives a clean fairness baseline when nodes are similar in capacity.

Load Adaptation

Least Connections adapts better when some requests finish quickly and others stay open much longer.

Key Locality

Consistent Hashing reduces cache misses and key migration when the pool changes.

Resilience

Without health checks, slow start, and safe draining, even a good algorithm sends traffic to a node that has not warmed up yet or is already failing.

How to Choose an Algorithm

Describe the traffic profile

Step 1

Capture request duration, burst behavior, key distribution, and the share of stateful operations.

Match the algorithm to traffic behavior

Step 2

Round Robin fits even pools, Least Connections fits uneven request duration, and Consistent Hashing fits key locality and state affinity.

Test degradation paths

Step 3

Simulate node shutdown, overload, and retries, then evaluate p95/p99 and the cost of key redistribution.

Add operational guardrails

Step 4

Configure health policy, slow start, draining, and alerts for saturation and hot keys.

Core Algorithms and Visual Walkthrough

The same request stream, three algorithms — and three different load patterns across the nodes. Switch the algorithm and watch where the skew shows up.

Round Robin

Requests are distributed in a cycle: S1 -> S2 -> S3 -> S1.

Pros

Very simple implementation and predictable behavior.
Works well for stateless backends with homogeneous nodes.
Low runtime overhead in the balancer.

Limitations

Does not account for current load or slow nodes.
Can hurt latency on heterogeneous server pools.

Best fit: Stateless APIs, even traffic, and similar instance capacity.

Request Queue

REQ-101Web

user:42

REQ-102Mobile

user:77

REQ-103Partner

tenant:acme

REQ-104Web

user:42

Load Balancer

Round Robin

Each new request is sent to the next server in a circular order.

Server A

handled: 0

active connections: 0

Server B

handled: 0

active connections: 0

Server C

handled: 0

active connections: 0

Ready

Ready for simulation. Start auto mode or run a single step.

Last decision: —

Comparing the Algorithms

Compare the algorithms by how they behave under overload, pool churn, and uneven key distribution. For Consistent Hashing, virtual nodes, hot keys, and key skew usually matter more than the headline description of the algorithm itself.

Algorithm	What It Observes	Behavior During Degradation	Distribution Quality	Complexity	Best fit
Round Robin	Almost no awareness of current node state	Removes a failed node with minimal extra logic	Solid when traffic is even and nodes are similar	Low	Homogeneous pools of stateless services
Least Connections	Tracks the number of active connections	Usually handles long requests and traffic spikes better	Higher when request duration varies a lot	Medium	Traffic with mixed short and long requests
Consistent Hashing	Routes deterministically by request key	Reassigns only part of the keys when the pool changes	Depends on virtual-node setup and key shape	High	Traffic with state affinity and strong key-locality needs

Practical Guidance

Quick Rules

Start with Round Robin when the pool is homogeneous and request duration is roughly even.
Move to Least Connections when short and long requests are mixed in the same flow.
Use Consistent Hashing when keeping the same key on the same node matters more than perfect evenness.
Validate the choice on peak load and degraded-node scenarios, not only on average traffic.

Common Mistakes

Choosing an algorithm without profiling request duration, burst shape, and key skew.
Using Consistent Hashing without virtual nodes or hot-key monitoring.
Ignoring health checks, slow start, and safe connection draining for new instances.
Treating active connections as the only load metric while ignoring CPU, RPS, and p99 latency.
Mixing sticky and non-sticky routing without a clear fallback policy.

Checks Before Production

1Active and passive health checks are enabled.

2Slow start and graceful connection draining are configured.

3Metrics cover p95/p99, node saturation, skew, and retry-storm rate.

4The algorithm is tested for both node shutdown and degraded-node scenarios.

Related chapters

Load Balancing - provides the L4/L7, health-management, and global-routing context on top of which concrete algorithms are chosen.
Design principles for scalable systems - explains how traffic-distribution decisions connect to latency, throughput, and the broader system-growth model.
Service Discovery - shows how to keep target pools up to date, which is required for correct balancing behavior.
Service Mesh Architecture - extends balancing algorithms to service-to-service traffic with retries and unhealthy-instance policies.
Caching strategies - helps reason about key locality and hot-key pressure, which are critical for Consistent Hashing.
Multi-region / Global Systems - adds the regional traffic-distribution layer and failover strategy across data centers.
API Gateway - demonstrates an applied L7 scenario where balancing algorithms work alongside routing policies.