Knowledge graphSettings

Updated: April 8, 2026 at 2:55 PM

Rate Limiter

medium

Classic rate limiter case: choosing an algorithm, storing counters in Redis, and deciding where to enforce limits in a distributed system.

A rate limiter looks like a counter problem until burst traffic, shared state, and limiter failure turn it into a fairness and survivability problem.

The chapter helps compare Token Bucket, Leaky Bucket, and Sliding Window, choose the enforcement point, store quota state, and reason about failure behavior.

For interviews and architecture discussions, it quickly reveals the quality of engineering judgment around abuse prevention, critical-path placement, and the cost of a bad default policy.

Control Plane

Focus on policy, limits, routing, and stable edge behavior under variable load.

Data Path

Keep latency and throughput predictable while traffic and burst pressure increase.

Failure Modes

Cover fail-open/fail-close behavior, graceful degradation, and safe fallback paths.

Ops Ready

Show monitoring for saturation, retry storms, and practical operational guardrails.

Rate Limiter is a classic case about protecting a public API and distributing shared capacity fairly. On the surface it looks like a simple request counter, but in practice you have to decide where the limit should be enforced, how state is shared across instances, and what happens when the limiter itself starts failing or becomes unavailable.

That is why the problem works so well in interviews: it quickly pushes the discussion toward trade-offs between limit accuracy, latency on the critical path, and system behavior when the protective layer fails.

Chapter 4

Alex Xu: Rate Limiter

Detailed breakdown of algorithms and architectural decisions in the book

Читать обзор

Why rate limiting matters

What happens without it

  • DDoS attacks — service overload with malicious traffic
  • Cascading failures — one client kills the entire service
  • Overuse of resources — uncontrolled API consumption
  • Uneven load - “noisy neighbors” disturb others

What rate limiting gives you

  • Service protection - overload prevention
  • Fair distribution — quotas for each client
  • Saving money — control over the use of paid APIs
  • Predictability - stable performance

Typical Requirements

Functional

  • FR1
    Limit requests by IP address, user ID, or API key
  • FR2
    Customizable limits for different endpoints
  • FR3
    Informative response when the limit is exceeded (429)
  • FR4
    Operate correctly in a distributed environment

Non-functional

  • NFR1
    Low latency — <1ms overhead per request
  • NFR2
    High availability — limiter failure must not equal service failure
  • NFR3
    Limit accuracy (up to ±5% error is acceptable in a distributed setup)
  • NFR4
    Horizontal scaling

Rate limiting algorithms

Each classic algorithm makes different trade-offs around accuracy, memory usage, and burst behavior:

Interactive visualization

1. Token Bucket

A bucket is refilled at a constant rate, and each request consumes one token.

Advantages

  • Simple implementation
  • Allows burst traffic
  • Memory efficient (2 numbers per user)

Drawbacks

  • Hard to choose bucket size
  • Race conditions in distributed setup

# Parameters:

bucket_size = 10 # max tokens

refill_rate = 2 # tokens/sec

Algorithm visualization

Tokens:
5.0 / 5
Refill: 1 token/sec
t = 0.0s
Request history:
Press "Start" and send requests

Architecture choices

Related case

API Gateway

Full API Gateway breakdown: routing, security, BFF patterns, and technology comparison.

Read the case

Where should the rate limiter live?

Client-side

Easy to bypass, so it only works as a secondary optimization.

API Gateway

The most common central enforcement point in front of application services. AWS API Gateway, Kong, Envoy.

In-service

Flexible, but it duplicates logic and makes counter coordination harder.

Zhiyong Tan

Acing SDI: Rate Limiter

Comparison of deployment options for rate limiting in real systems

Читать обзор

Implementation with Redis

Redis is a common choice for a rate limiter because it is memory-backed and supports atomic operations.

Sliding Window Counter on Redis

-- Lua script for atomic updates
local key = KEYS[1]
local window = tonumber(ARGV[1]) -- window size in seconds
local limit = tonumber(ARGV[2]) -- request limit
local now = tonumber(ARGV[3]) -- current timestamp

-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

-- Count requests in the current window
local count = redis.call('ZCARD', key)

if count < limit then
    -- Add a new request
    redis.call('ZADD', key, now, now .. '-' .. math.random())
    redis.call('EXPIRE', key, window)
    return 1 -- allowed
else
    return 0 -- rejected
end

Race-condition protection

Problem: Concurrent requests read the same count.

Solution: Lua scripts execute atomically in Redis.

Alternative: Redis MULTI/EXEC or Redlock for distributed locking.

Synchronization model

Centralized: One Redis cluster serves all application instances.

Eventual consistency: Local counters + periodic synchronization.

Trade-off: Accuracy versus latency.

HTTP response headers

# If the request is successful:

X-RateLimit-Limit: 100

X-RateLimit-Remaining: 42

X-RateLimit-Reset: 1640995200

# If the limit is exceeded:

HTTP/1.1 429 Too Many Requests

Retry-After: 30

Deep dive: multi-tier rate limiting

Real systems usually enforce limits at several levels at once:

Limit levels

  • Global: 10M requests/day for the entire service
  • Per-user: 1000 requests/hour per user
  • Per-endpoint: 100 requests/minute on `/api/search`
  • Per-IP: protection against overload from a single source

Graceful Degradation

  • Soft limit: Warning in headers
  • Hard limit: 429 + Retry-After
  • Redis down: Fail-open (skip) or fail-close?
  • Monitoring: Alert for traffic anomalies

What to emphasize in an interview

  • 1
    Start with the requirements: what traffic profile do you expect, how much error is acceptable, and do you need distributed enforcement?
  • 2
    Select the algorithm: Token Bucket works well for bursts, while Sliding Window Counter is a better fit when you need a more accurate view of current load.
  • 3
    Make the trade-offs explicit: memory versus accuracy, latency versus stricter coordination between instances.
  • 4
    Redis + Lua: atomic operations remove counter-update races.
  • 5
    Failure behavior: what should happen if the rate limiter itself becomes unavailable?

For additional practice and implementation comparison, review System Design Primer: Rate Limiter and Cloudflare Rate Limiting Rules.

Related chapters

  • System Design Primer (short summary) - provides a baseline framework for requirements, constraints, and bottleneck analysis in rate-limiter design.
  • System Design Interview - Alex Xu - contains the classic interview breakdown of rate limiting algorithms and practical trade-offs.
  • Acing the System Design Interview - adds practical deployment options for rate limiting in distributed systems.
  • API Gateway - shows the primary production layer where rate limiting is typically enforced centrally.
  • API Security Patterns - connects rate limiting with broader API protection: auth, abuse control, and policy enforcement.
  • Caching Strategies - helps choose storage and caching approaches for counters without adding too much latency to the critical path.
  • Resilience Patterns - expands on failure behavior: when to allow requests, when to block them, and how to degrade safely during limiter outages.
  • Notification System - demonstrates how multi-tier limits protect downstream systems from burst traffic.
  • Web Crawler - illustrates why controlled request pacing is required for safe and polite external crawling.
  • Payment System - adds a critical-systems context where limits protect expensive and sensitive financial flows.

Enable tracking in Settings