Reference
Envoy Load Balancing
Practical guide to balancing algorithms and real-world usage scenarios.
A balancing algorithm directly affects latency, resilience, and resource efficiency. The same instance pool can behave very differently depending on whether you choose equal rotation, current-load adaptation, or key-based routing.
Core Algorithms and Visualization
Round Robin
Requests are distributed in a cycle: S1 -> S2 -> S3 -> S1.
Pros
- Very simple implementation and predictable behavior.
- Works well for stateless backends with homogeneous nodes.
- Low runtime overhead in the balancer.
Limitations
- Does not account for current load or slow nodes.
- Can hurt latency on heterogeneous server pools.
Request Queue
Load Balancer
Round Robin
Server A
Server B
Server C
Ready for simulation. Start auto mode or run a single step.
Last decision: —
Comparison and Trade-offs
| Algorithm | State awareness | Failover behavior | Balancing quality | Complexity | Best fit |
|---|---|---|---|---|---|
| Round Robin | No | Simple node exclusion | Medium | Low | Stateless, homogeneous pool |
| Least Connections | Active connections | Works well for burst + long-lived connections | High | Medium | Mixed latency workload |
| Consistent Hashing | Key-aware routing | Partial remap to neighboring nodes | Depends on virtual nodes | High | Stateful/cache-sensitive traffic |
Selection in Practice
Quick Rules
- Start with Round Robin for simple stateless APIs.
- Move to Least Connections for long and uneven requests.
- Use Consistent Hashing when locality and sticky state matter.
- Validate on peak traffic profile, not only average load.
Common Mistakes
- Choosing an algorithm without profiling traffic shape (request duration, burst, key skew).
- Using Consistent Hashing without virtual nodes and hot-key monitoring.
- Ignoring health checks and slow-start for new instances.
- Treating active connections as the only load metric without CPU/RPS/p99 latency.
- Mixing sticky and non-sticky routing without an explicit fallback policy.
