Rate Limiter is one of the most popular problems in System Design interviews. It appears in the books by Alex Xu ("System Design Interview") and Zhiyong Tan ("Acing the System Design Interview") as a basic example of designing a distributed system with stringent performance requirements.
Chapter 4
Alex Xu: Rate Limiter
Detailed analysis of algorithms and architecture in the book
Why is Rate Limiter needed?
Problems without Rate Limiting
- DDoS attacks — service overload with malicious traffic
- Cascading failures — one client kills the entire service
- Overuse of resources — uncontrolled API consumption
- Uneven load - “noisy neighbors” disturb others
Benefits of Rate Limiting
- Service protection - overload prevention
- Fair distribution — quotas for each client
- Saving money — control over the use of paid APIs
- Predictability - stable performance
Typical Requirements
Functional
- FR1Limiting requests by IP, user ID or API key
- FR2Customizable limits for different endpoints
- FR3Informative response when the limit is exceeded (429)
- FR4Distributed environment support
Non-functional
- NFR1Low latency — <1ms overhead per request
- NFR2High Availability — limiter failure ≠ service failure
- NFR3Limit accuracy (acceptable ±5% in distributed)
- NFR4Horizontal scaling
Rate Limiting Algorithms
There are several classic algorithms, each with its own trade-offs:
Interactive visualization
1. Token Bucket
A bucket is refilled at a constant rate. Each request consumes one token.
Advantages
- Simple implementation
- Allows burst traffic
- Memory efficient (2 numbers per user)
Drawbacks
- Hard to choose bucket size
- Race conditions in distributed setup
# Parameters:
bucket_size = 10 # max tokens
refill_rate = 2 # tokens/sec
Algorithm visualization
System architecture
Related case
API Gateway
Full analysis of API Gateway: routing, security, BFF pattern and comparison of technologies.
Where to place Rate Limiter?
Client-side
Easy to get around, not reliable. Used only as an optimization.
API Gateway
The ideal place is centralized, before the services. AWS API Gateway, Kong, Envoy.
Middleware
Each service is flexible, but duplication and synchronization are difficult.
Zhiyong Tan
Acing SDI: Rate Limiter
Comparison of approaches to placing rate limiter
Implementation in Redis
Redis is the standard choice for Rate Limiter due to its in-memory speed and atomic operations.
Sliding Window Counter on Redis
-- Lua script for atomicity
local key = KEYS[1]
local window = tonumber(ARGV[1]) -- window size in seconds
local limit = tonumber(ARGV[2]) -- request limit
local now = tonumber(ARGV[3]) -- current timestamp
-- Deleting old entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- We count current requests
local count = redis.call('ZCARD', key)
if count < limit then
-- Add a new request
redis.call('ZADD', key, now, now .. '-' .. math.random())
redis.call('EXPIRE', key, window)
return 1 -- allowed
else
return 0 -- rejected
endRace Condition Protection
Problem: Concurrent requests read the same count.
Solution: Lua scripts execute atomically in Redis.
Alternative: Redis MULTI/EXEC or Redlock for distributed locking.
Synchronization
Centralized: One Redis cluster for all instances.
Eventual consistency: Local counters + periodic synchronization.
Trade-off: Accuracy vs Latency.
HTTP Response Headers
# If the request is successful:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1640995200
# If the limit is exceeded:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Deep Dive: Multi-tier Rate Limiting
In real systems, several levels of limits are used:
By levels
- Global: 10M req/day for the entire service
- Per-user: 1000 req/hour per user
- Per-endpoint: 100 req/min on /api/search
- Per-IP: DDoS protection from one IP
Graceful Degradation
- Soft limit: Warning in headers
- Hard limit: 429 + Retry-After
- Redis down: Fail-open (skip) or fail-close?
- Monitoring: Alert for traffic anomalies
Key takeaways from the interview
- 1Start with the requirements: What kind of traffic? Is inaccuracy acceptable? Distributed?
- 2Select algorithm: Token Bucket for burst, Sliding Window Counter for accuracy.
- 3Discuss trade-offs: Memory vs Accuracy, Latency vs Consistency.
- 4Redis + Lua: Atomic operations solve race conditions.
- 5Fail-open vs Fail-close: What to do if the rate limiter fails?
For additional practice and implementation comparison, review System Design Primer: Rate Limiter and Cloudflare Rate Limiting Rules.
