A rate limiter looks like a counter problem until burst traffic, shared state, and limiter failure turn it into a fairness and survivability problem.
The chapter helps compare Token Bucket, Leaky Bucket, and Sliding Window, choose the enforcement point, store quota state, and reason about failure behavior.
For interviews and architecture discussions, it quickly reveals the quality of engineering judgment around abuse prevention, critical-path placement, and the cost of a bad default policy.
Control Plane
Focus on policy, limits, routing, and stable edge behavior under variable load.
Data Path
Keep latency and throughput predictable while traffic and burst pressure increase.
Failure Modes
Cover fail-open/fail-close behavior, graceful degradation, and safe fallback paths.
Ops Ready
Show monitoring for saturation, retry storms, and practical operational guardrails.
Rate Limiter is a classic case about protecting a public API and distributing shared capacity fairly. On the surface it looks like a simple request counter, but in practice you have to decide where the limit should be enforced, how state is shared across instances, and what happens when the limiter itself starts failing or becomes unavailable.
That is why the problem works so well in interviews: it quickly pushes the discussion toward trade-offs between limit accuracy, latency on the critical path, and system behavior when the protective layer fails.
Chapter 4
Alex Xu: Rate Limiter
Detailed breakdown of algorithms and architectural decisions in the book
Why rate limiting matters
What happens without it
- DDoS attacks — service overload with malicious traffic
- Cascading failures — one client kills the entire service
- Overuse of resources — uncontrolled API consumption
- Uneven load - “noisy neighbors” disturb others
What rate limiting gives you
- Service protection - overload prevention
- Fair distribution — quotas for each client
- Saving money — control over the use of paid APIs
- Predictability - stable performance
Typical Requirements
Functional
- FR1Limit requests by IP address, user ID, or API key
- FR2Customizable limits for different endpoints
- FR3Informative response when the limit is exceeded (429)
- FR4Operate correctly in a distributed environment
Non-functional
- NFR1Low latency — <1ms overhead per request
- NFR2High availability — limiter failure must not equal service failure
- NFR3Limit accuracy (up to ±5% error is acceptable in a distributed setup)
- NFR4Horizontal scaling
Rate limiting algorithms
Each classic algorithm makes different trade-offs around accuracy, memory usage, and burst behavior:
Interactive visualization
1. Token Bucket
A bucket is refilled at a constant rate, and each request consumes one token.
Advantages
- Simple implementation
- Allows burst traffic
- Memory efficient (2 numbers per user)
Drawbacks
- Hard to choose bucket size
- Race conditions in distributed setup
# Parameters:
bucket_size = 10 # max tokens
refill_rate = 2 # tokens/sec
Algorithm visualization
Architecture choices
Related case
API Gateway
Full API Gateway breakdown: routing, security, BFF patterns, and technology comparison.
Where should the rate limiter live?
Client-side
Easy to bypass, so it only works as a secondary optimization.
API Gateway
The most common central enforcement point in front of application services. AWS API Gateway, Kong, Envoy.
In-service
Flexible, but it duplicates logic and makes counter coordination harder.
Zhiyong Tan
Acing SDI: Rate Limiter
Comparison of deployment options for rate limiting in real systems
Implementation with Redis
Redis is a common choice for a rate limiter because it is memory-backed and supports atomic operations.
Sliding Window Counter on Redis
-- Lua script for atomic updates
local key = KEYS[1]
local window = tonumber(ARGV[1]) -- window size in seconds
local limit = tonumber(ARGV[2]) -- request limit
local now = tonumber(ARGV[3]) -- current timestamp
-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count requests in the current window
local count = redis.call('ZCARD', key)
if count < limit then
-- Add a new request
redis.call('ZADD', key, now, now .. '-' .. math.random())
redis.call('EXPIRE', key, window)
return 1 -- allowed
else
return 0 -- rejected
endRace-condition protection
Problem: Concurrent requests read the same count.
Solution: Lua scripts execute atomically in Redis.
Alternative: Redis MULTI/EXEC or Redlock for distributed locking.
Synchronization model
Centralized: One Redis cluster serves all application instances.
Eventual consistency: Local counters + periodic synchronization.
Trade-off: Accuracy versus latency.
HTTP response headers
# If the request is successful:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1640995200
# If the limit is exceeded:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Deep dive: multi-tier rate limiting
Real systems usually enforce limits at several levels at once:
Limit levels
- Global: 10M requests/day for the entire service
- Per-user: 1000 requests/hour per user
- Per-endpoint: 100 requests/minute on `/api/search`
- Per-IP: protection against overload from a single source
Graceful Degradation
- Soft limit: Warning in headers
- Hard limit: 429 + Retry-After
- Redis down: Fail-open (skip) or fail-close?
- Monitoring: Alert for traffic anomalies
What to emphasize in an interview
- 1Start with the requirements: what traffic profile do you expect, how much error is acceptable, and do you need distributed enforcement?
- 2Select the algorithm: Token Bucket works well for bursts, while Sliding Window Counter is a better fit when you need a more accurate view of current load.
- 3Make the trade-offs explicit: memory versus accuracy, latency versus stricter coordination between instances.
- 4Redis + Lua: atomic operations remove counter-update races.
- 5Failure behavior: what should happen if the rate limiter itself becomes unavailable?
For additional practice and implementation comparison, review System Design Primer: Rate Limiter and Cloudflare Rate Limiting Rules.
Related chapters
- System Design Primer (short summary) - provides a baseline framework for requirements, constraints, and bottleneck analysis in rate-limiter design.
- System Design Interview - Alex Xu - contains the classic interview breakdown of rate limiting algorithms and practical trade-offs.
- Acing the System Design Interview - adds practical deployment options for rate limiting in distributed systems.
- API Gateway - shows the primary production layer where rate limiting is typically enforced centrally.
- API Security Patterns - connects rate limiting with broader API protection: auth, abuse control, and policy enforcement.
- Caching Strategies - helps choose storage and caching approaches for counters without adding too much latency to the critical path.
- Resilience Patterns - expands on failure behavior: when to allow requests, when to block them, and how to degrade safely during limiter outages.
- Notification System - demonstrates how multi-tier limits protect downstream systems from burst traffic.
- Web Crawler - illustrates why controlled request pacing is required for safe and polite external crawling.
- Payment System - adds a critical-systems context where limits protect expensive and sensitive financial flows.
