System Design Space
Knowledge graphSettings

Updated: March 2, 2026 at 9:05 AM

Rate Limiter

mid

Classic task: Token Bucket, Sliding Window, Redis + Lua scripts, distributed rate limiting.

Rate Limiter is one of the most popular problems in System Design interviews. It appears in the books by Alex Xu ("System Design Interview") and Zhiyong Tan ("Acing the System Design Interview") as a basic example of designing a distributed system with stringent performance requirements.

Chapter 4

Alex Xu: Rate Limiter

Detailed analysis of algorithms and architecture in the book

Читать обзор

Why is Rate Limiter needed?

Problems without Rate Limiting

  • DDoS attacks — service overload with malicious traffic
  • Cascading failures — one client kills the entire service
  • Overuse of resources — uncontrolled API consumption
  • Uneven load - “noisy neighbors” disturb others

Benefits of Rate Limiting

  • Service protection - overload prevention
  • Fair distribution — quotas for each client
  • Saving money — control over the use of paid APIs
  • Predictability - stable performance

Typical Requirements

Functional

  • FR1
    Limiting requests by IP, user ID or API key
  • FR2
    Customizable limits for different endpoints
  • FR3
    Informative response when the limit is exceeded (429)
  • FR4
    Distributed environment support

Non-functional

  • NFR1
    Low latency — <1ms overhead per request
  • NFR2
    High Availability — limiter failure ≠ service failure
  • NFR3
    Limit accuracy (acceptable ±5% in distributed)
  • NFR4
    Horizontal scaling

Rate Limiting Algorithms

There are several classic algorithms, each with its own trade-offs:

Interactive visualization

1. Token Bucket

A bucket is refilled at a constant rate. Each request consumes one token.

Advantages

  • Simple implementation
  • Allows burst traffic
  • Memory efficient (2 numbers per user)

Drawbacks

  • Hard to choose bucket size
  • Race conditions in distributed setup

# Parameters:

bucket_size = 10 # max tokens

refill_rate = 2 # tokens/sec

Algorithm visualization

Tokens:
5.0 / 5
Refill: 1 token/sec
t = 0.0s
Request history:
Press "Start" and send requests

System architecture

Related case

API Gateway

Full analysis of API Gateway: routing, security, BFF pattern and comparison of technologies.

Read the case

Where to place Rate Limiter?

Client-side

Easy to get around, not reliable. Used only as an optimization.

API Gateway

The ideal place is centralized, before the services. AWS API Gateway, Kong, Envoy.

Middleware

Each service is flexible, but duplication and synchronization are difficult.

Zhiyong Tan

Acing SDI: Rate Limiter

Comparison of approaches to placing rate limiter

Читать обзор

Implementation in Redis

Redis is the standard choice for Rate Limiter due to its in-memory speed and atomic operations.

Sliding Window Counter on Redis

-- Lua script for atomicity
local key = KEYS[1]
local window = tonumber(ARGV[1]) -- window size in seconds
local limit = tonumber(ARGV[2]) -- request limit
local now = tonumber(ARGV[3]) -- current timestamp

-- Deleting old entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

-- We count current requests
local count = redis.call('ZCARD', key)

if count < limit then
    -- Add a new request
    redis.call('ZADD', key, now, now .. '-' .. math.random())
    redis.call('EXPIRE', key, window)
    return 1 -- allowed
else
    return 0 -- rejected
end

Race Condition Protection

Problem: Concurrent requests read the same count.

Solution: Lua scripts execute atomically in Redis.

Alternative: Redis MULTI/EXEC or Redlock for distributed locking.

Synchronization

Centralized: One Redis cluster for all instances.

Eventual consistency: Local counters + periodic synchronization.

Trade-off: Accuracy vs Latency.

HTTP Response Headers

# If the request is successful:

X-RateLimit-Limit: 100

X-RateLimit-Remaining: 42

X-RateLimit-Reset: 1640995200

# If the limit is exceeded:

HTTP/1.1 429 Too Many Requests

Retry-After: 30

Deep Dive: Multi-tier Rate Limiting

In real systems, several levels of limits are used:

By levels

  • Global: 10M req/day for the entire service
  • Per-user: 1000 req/hour per user
  • Per-endpoint: 100 req/min on /api/search
  • Per-IP: DDoS protection from one IP

Graceful Degradation

  • Soft limit: Warning in headers
  • Hard limit: 429 + Retry-After
  • Redis down: Fail-open (skip) or fail-close?
  • Monitoring: Alert for traffic anomalies

Key takeaways from the interview

  • 1
    Start with the requirements: What kind of traffic? Is inaccuracy acceptable? Distributed?
  • 2
    Select algorithm: Token Bucket for burst, Sliding Window Counter for accuracy.
  • 3
    Discuss trade-offs: Memory vs Accuracy, Latency vs Consistency.
  • 4
    Redis + Lua: Atomic operations solve race conditions.
  • 5
    Fail-open vs Fail-close: What to do if the rate limiter fails?

For additional practice and implementation comparison, review System Design Primer: Rate Limiter and Cloudflare Rate Limiting Rules.

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov