A rate limiter looks like a counter problem until burst traffic, distributed state, and limiter failure turn it into a fairness and survivability problem.
The case helps compare token bucket, leaky bucket, and sliding window approaches, choose the enforcement point, model quota state, and reason about fail-open versus fail-close behavior.
For interviews and design reviews, it exposes the quality of engineering judgment around abuse prevention, critical-path placement, and the cost of a bad default.
Control Plane
Focus on policy, limits, routing, and stable edge behavior under variable load.
Data Path
Keep latency and throughput predictable while traffic and burst pressure increase.
Failure Modes
Cover fail-open/fail-close behavior, graceful degradation, and safe fallback paths.
Ops Ready
Show monitoring for saturation, retry storms, and practical operational guardrails.
Rate Limiter is one of the most popular problems in System Design interviews. It appears in the books by Alex Xu ("System Design Interview") and Zhiyong Tan ("Acing the System Design Interview") as a basic example of designing a distributed system with stringent performance requirements.
Chapter 4
Alex Xu: Rate Limiter
Detailed analysis of algorithms and architecture in the book
Why is Rate Limiter needed?
Problems without Rate Limiting
- DDoS attacks — service overload with malicious traffic
- Cascading failures — one client kills the entire service
- Overuse of resources — uncontrolled API consumption
- Uneven load - “noisy neighbors” disturb others
Benefits of Rate Limiting
- Service protection - overload prevention
- Fair distribution — quotas for each client
- Saving money — control over the use of paid APIs
- Predictability - stable performance
Typical Requirements
Functional
- FR1Limiting requests by IP, user ID or API key
- FR2Customizable limits for different endpoints
- FR3Informative response when the limit is exceeded (429)
- FR4Distributed environment support
Non-functional
- NFR1Low latency — <1ms overhead per request
- NFR2High Availability — limiter failure ≠ service failure
- NFR3Limit accuracy (acceptable ±5% in distributed)
- NFR4Horizontal scaling
Rate Limiting Algorithms
There are several classic algorithms, each with its own trade-offs:
Interactive visualization
1. Token Bucket
A bucket is refilled at a constant rate. Each request consumes one token.
Advantages
- Simple implementation
- Allows burst traffic
- Memory efficient (2 numbers per user)
Drawbacks
- Hard to choose bucket size
- Race conditions in distributed setup
# Parameters:
bucket_size = 10 # max tokens
refill_rate = 2 # tokens/sec
Algorithm visualization
System architecture
Related case
API Gateway
Full analysis of API Gateway: routing, security, BFF pattern and comparison of technologies.
Where to place Rate Limiter?
Client-side
Easy to get around, not reliable. Used only as an optimization.
API Gateway
The ideal place is centralized, before the services. AWS API Gateway, Kong, Envoy.
Middleware
Each service is flexible, but duplication and synchronization are difficult.
Zhiyong Tan
Acing SDI: Rate Limiter
Comparison of approaches to placing rate limiter
Implementation in Redis
Redis is the standard choice for Rate Limiter due to its in-memory speed and atomic operations.
Sliding Window Counter on Redis
-- Lua script for atomicity
local key = KEYS[1]
local window = tonumber(ARGV[1]) -- window size in seconds
local limit = tonumber(ARGV[2]) -- request limit
local now = tonumber(ARGV[3]) -- current timestamp
-- Deleting old entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- We count current requests
local count = redis.call('ZCARD', key)
if count < limit then
-- Add a new request
redis.call('ZADD', key, now, now .. '-' .. math.random())
redis.call('EXPIRE', key, window)
return 1 -- allowed
else
return 0 -- rejected
endRace Condition Protection
Problem: Concurrent requests read the same count.
Solution: Lua scripts execute atomically in Redis.
Alternative: Redis MULTI/EXEC or Redlock for distributed locking.
Synchronization
Centralized: One Redis cluster for all instances.
Eventual consistency: Local counters + periodic synchronization.
Trade-off: Accuracy vs Latency.
HTTP Response Headers
# If the request is successful:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1640995200
# If the limit is exceeded:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Deep Dive: Multi-tier Rate Limiting
In real systems, several levels of limits are used:
By levels
- Global: 10M req/day for the entire service
- Per-user: 1000 req/hour per user
- Per-endpoint: 100 req/min on /api/search
- Per-IP: DDoS protection from one IP
Graceful Degradation
- Soft limit: Warning in headers
- Hard limit: 429 + Retry-After
- Redis down: Fail-open (skip) or fail-close?
- Monitoring: Alert for traffic anomalies
Key takeaways from the interview
- 1Start with the requirements: What kind of traffic? Is inaccuracy acceptable? Distributed?
- 2Select algorithm: Token Bucket for burst, Sliding Window Counter for accuracy.
- 3Discuss trade-offs: Memory vs Accuracy, Latency vs Consistency.
- 4Redis + Lua: Atomic operations solve race conditions.
- 5Fail-open vs Fail-close: What to do if the rate limiter fails?
For additional practice and implementation comparison, review System Design Primer: Rate Limiter and Cloudflare Rate Limiting Rules.
Related chapters
- System Design Primer (short summary) - provides foundational templates for requirements framing and constraint analysis in rate-limiter design.
- System Design Interview - Alex Xu - contains the classic interview breakdown of rate limiting algorithms and practical trade-offs.
- Acing the System Design Interview - adds alternative placement options for distributed rate limiting in real systems.
- API Gateway - shows the primary production layer where rate limiting is typically enforced centrally.
- API Security Patterns - connects rate limiting with broader API protection: auth, abuse control, and policy enforcement.
- Caching Strategies - helps choose cache/storage approaches for limiter counters while keeping latency overhead low.
- Resilience Patterns - expands fail-open/fail-close decisions and degradation behavior during limiter outages.
- Notification System - demonstrates how multi-tier limits protect downstream systems from burst traffic.
- Web Crawler - illustrates why controlled request pacing is required for safe and polite external crawling.
- Payment System - adds a high-stakes context where limits protect critical financial flows.
