System Design Space
Knowledge graphSettings

Updated: March 3, 2026 at 11:15 PM

Traffic Load Balancing (Load Balancing)

mid

L4 vs L7 routing, health checks, connection draining, GSLB patterns, and service mesh balancing in Kubernetes.

Reference

Envoy LB Overview

Detailed guidance on load balancing policies, outlier detection, and locality-aware traffic steering.

Open source

Load balancing is more than an algorithm pick. A production-grade design includes a decision point (L4/L7), health-check strategy, graceful rollout policy, and a global traffic-routing model across regions.

L4 vs L7: choosing the right layer

What this is about: choosing where routing decisions are made: at transport layer (L4) or at application layer (L7).

Why this comes first: this is the base decision for the rest of the chapter. It determines available routing rules, processing cost, observability, and resilience controls.

Best-fit scenarios: L4 usually fits stateful TCP services (DB/cache/broker), while L7 fits HTTP/gRPC APIs with canary releases, header/path routing, and richer policy control.

CriteriaL4L7
OSI layerL4 (TCP/UDP): routing by IP/port without HTTP awareness.L7 (HTTP/gRPC): routing by path/host/header/cookie.
Routing flexibilityLower: hash/leastconn/round-robin without content-aware policies.Higher: canary, A/B, sticky sessions, rate limits, policy-based routing.
Performance profileLower CPU/latency overhead, strong for high-throughput TCP.More per-request logic cost, but finer traffic control.
Typical use casesDatabases, Redis, MQTT, binary protocols, ultra-low-latency TCP path.API gateways, web apps, gRPC mesh, product-level traffic policies.

L4 example (HAProxy, TCP)

Best for PostgreSQL/Redis and other stateful TCP services without HTTP-aware routing.

frontend ft_postgres
  bind *:5432
  mode tcp
  default_backend bk_postgres

backend bk_postgres
  mode tcp
  balance leastconn
  option tcp-check
  default-server inter 2s fall 3 rise 2
  server pg-1 10.0.1.11:5432 check
  server pg-2 10.0.1.12:5432 check

L7 example (Nginx, HTTP)

Best for API gateways, path-based routing, and request-level policies.

upstream api_pool {
  least_conn;
  server 10.0.2.11:8080 max_fails=3 fail_timeout=10s;
  server 10.0.2.12:8080 max_fails=3 fail_timeout=10s;
}

server {
  listen 80;

  location /api/ {
    proxy_pass http://api_pool;
    proxy_set_header X-Request-Id $request_id;
  }

  location /static/ {
    proxy_pass http://static-service:8080;
  }
}

Health checks, grace period, and connection draining

What this is about: backend instance lifecycle in the load balancer - when to add instances, when to eject them, and how to remove them safely.

Why this matters: even the best algorithm fails if traffic still goes to degraded or not-yet-warmed instances. This is where 5xx/timeout spikes are reduced during deploys and failover.

Best-fit scenarios: autoscaling on Kubernetes, rolling deploys, blue/green rollout, and long-lived connections (WebSocket/gRPC streams) where abrupt shutdown causes user-visible failures.

Active health checks

The load balancer probes health endpoints/TCP handshakes itself. You can tune interval, timeout, rise/fall and remove degraded instances before hard failure.

Passive health checks

Degradation is inferred from live traffic: 5xx, timeouts, reset rate, high latency. Useful when an endpoint is technically up but practically overloaded.

Grace period / slow start

New pods receive limited traffic first. This reduces cold-start spikes and immediate ejection caused by cache/JIT/connection-pool warm-up.

Connection draining

When removing an instance from rotation, stop new connections but allow in-flight traffic to complete. This lowers client errors during deploy/failover.

HAProxy: slow start + health policy

backend app
  balance leastconn
  option httpchk GET /healthz
  http-check expect status 200
  default-server inter 2s fall 3 rise 2 slowstart 30s
  server app-1 10.0.3.11:8080 check
  server app-2 10.0.3.12:8080 check

Kubernetes: readiness + graceful shutdown

spec:
  terminationGracePeriodSeconds: 40
  containers:
    - name: app
      readinessProbe:
        httpGet:
          path: /readyz
          port: 8080
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "sleep 20"]

Global Server Load Balancing (GSLB)

What this is about: balancing across regions and data centers, not just across instances inside one cluster.

Why we should care: local LB does not solve global latency or regional outages. Without GSLB, users may be routed to distant or degraded regions.

Best-fit scenarios: multi-region products, global B2C traffic, strict RTO/RPO requirements, regional compliance constraints, and active DR posture.

DNS-based GSLB

Authoritative DNS selects a region by latency/geo/weight/health and returns the closest endpoint.

Pros: Simple integration and a strong baseline for multi-region rollouts.

Limitations: Reaction speed is bounded by TTL and DNS caching behavior on clients/resolvers, which may delay failover.

When to use: Web/API traffic where seconds-to-tens-of-seconds failover is acceptable.

Anycast

The same IP is announced from multiple PoPs/regions; BGP directs traffic to the topologically closest edge.

Pros: Fast global distribution and strong resilience for edge ingress.

Limitations: Less L7 control at DNS-answer level; requires mature networking, observability, and anti-flap ops.

When to use: Edge/L4 ingress, DNS, DDoS-resilient front doors, globally distributed APIs with minimal RTT.

Service mesh (Envoy, Istio) in Kubernetes

What this is about: balancing service-to-service traffic in microservices, where each internal RPC call is a separate load-balancing decision.

Why this appears here: after L4/L7, health policy, and GSLB, the next step is showing how the same principles scale inside Kubernetes via sidecar proxies and control-plane-managed policy.

Best-fit scenarios: dozens/hundreds of services, unified traffic policy, canary/traffic splitting, mTLS, and centralized retries, outlier detection, and locality failover controls.

In a mesh, balancing runs inside sidecar proxies (typically Envoy). Istio control plane publishes endpoints and policy, while dataplane applies load balancing, retries, outlier detection, and locality failover for each service call.

Istio DestinationRule

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: payments
spec:
  host: payments.default.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      simple: LEAST_REQUEST
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 5s
      baseEjectionTime: 30s

Istio locality failover

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: checkout-locality
spec:
  host: checkout.default.svc.cluster.local
  trafficPolicy:
    localityLbSetting:
      enabled: true
      failover:
        - from: us-east1
          to: us-central1

Recommendations

  • Start with L7 for product HTTP APIs, but keep an L4 path for stateful TCP services (DB/cache).
  • Combine active and passive health checks: active catches hard failures, passive catches degraded behavior.
  • Apply grace period + connection draining for every rollout, not only incident scenarios.
  • For global routing, split responsibilities: DNS GSLB for region choice, local LB/mesh for intra-region balancing.

Common mistakes

  • Using round-robin by default without validating traffic shape, p99, and connection duration.
  • Treating Kubernetes readiness/liveness as a complete substitute for L7 passive health checks.
  • Setting long DNS TTLs while expecting fast failover in multi-region incidents.
  • Skipping connection draining during deploys and causing 5xx spikes on long-lived requests.

References

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov