Knowledge graphSettings

Updated: March 25, 2026 at 4:52 AM

Rate Limiter

medium

Classic task: Token Bucket, Sliding Window, Redis + Lua scripts, distributed rate limiting.

A rate limiter looks like a counter problem until burst traffic, distributed state, and limiter failure turn it into a fairness and survivability problem.

The case helps compare token bucket, leaky bucket, and sliding window approaches, choose the enforcement point, model quota state, and reason about fail-open versus fail-close behavior.

For interviews and design reviews, it exposes the quality of engineering judgment around abuse prevention, critical-path placement, and the cost of a bad default.

Control Plane

Focus on policy, limits, routing, and stable edge behavior under variable load.

Data Path

Keep latency and throughput predictable while traffic and burst pressure increase.

Failure Modes

Cover fail-open/fail-close behavior, graceful degradation, and safe fallback paths.

Ops Ready

Show monitoring for saturation, retry storms, and practical operational guardrails.

Rate Limiter is one of the most popular problems in System Design interviews. It appears in the books by Alex Xu ("System Design Interview") and Zhiyong Tan ("Acing the System Design Interview") as a basic example of designing a distributed system with stringent performance requirements.

Chapter 4

Alex Xu: Rate Limiter

Detailed analysis of algorithms and architecture in the book

Читать обзор

Why is Rate Limiter needed?

Problems without Rate Limiting

  • DDoS attacks — service overload with malicious traffic
  • Cascading failures — one client kills the entire service
  • Overuse of resources — uncontrolled API consumption
  • Uneven load - “noisy neighbors” disturb others

Benefits of Rate Limiting

  • Service protection - overload prevention
  • Fair distribution — quotas for each client
  • Saving money — control over the use of paid APIs
  • Predictability - stable performance

Typical Requirements

Functional

  • FR1
    Limiting requests by IP, user ID or API key
  • FR2
    Customizable limits for different endpoints
  • FR3
    Informative response when the limit is exceeded (429)
  • FR4
    Distributed environment support

Non-functional

  • NFR1
    Low latency — <1ms overhead per request
  • NFR2
    High Availability — limiter failure ≠ service failure
  • NFR3
    Limit accuracy (acceptable ±5% in distributed)
  • NFR4
    Horizontal scaling

Rate Limiting Algorithms

There are several classic algorithms, each with its own trade-offs:

Interactive visualization

1. Token Bucket

A bucket is refilled at a constant rate. Each request consumes one token.

Advantages

  • Simple implementation
  • Allows burst traffic
  • Memory efficient (2 numbers per user)

Drawbacks

  • Hard to choose bucket size
  • Race conditions in distributed setup

# Parameters:

bucket_size = 10 # max tokens

refill_rate = 2 # tokens/sec

Algorithm visualization

Tokens:
5.0 / 5
Refill: 1 token/sec
t = 0.0s
Request history:
Press "Start" and send requests

System architecture

Related case

API Gateway

Full analysis of API Gateway: routing, security, BFF pattern and comparison of technologies.

Read the case

Where to place Rate Limiter?

Client-side

Easy to get around, not reliable. Used only as an optimization.

API Gateway

The ideal place is centralized, before the services. AWS API Gateway, Kong, Envoy.

Middleware

Each service is flexible, but duplication and synchronization are difficult.

Zhiyong Tan

Acing SDI: Rate Limiter

Comparison of approaches to placing rate limiter

Читать обзор

Implementation in Redis

Redis is the standard choice for Rate Limiter due to its in-memory speed and atomic operations.

Sliding Window Counter on Redis

-- Lua script for atomicity
local key = KEYS[1]
local window = tonumber(ARGV[1]) -- window size in seconds
local limit = tonumber(ARGV[2]) -- request limit
local now = tonumber(ARGV[3]) -- current timestamp

-- Deleting old entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

-- We count current requests
local count = redis.call('ZCARD', key)

if count < limit then
    -- Add a new request
    redis.call('ZADD', key, now, now .. '-' .. math.random())
    redis.call('EXPIRE', key, window)
    return 1 -- allowed
else
    return 0 -- rejected
end

Race Condition Protection

Problem: Concurrent requests read the same count.

Solution: Lua scripts execute atomically in Redis.

Alternative: Redis MULTI/EXEC or Redlock for distributed locking.

Synchronization

Centralized: One Redis cluster for all instances.

Eventual consistency: Local counters + periodic synchronization.

Trade-off: Accuracy vs Latency.

HTTP Response Headers

# If the request is successful:

X-RateLimit-Limit: 100

X-RateLimit-Remaining: 42

X-RateLimit-Reset: 1640995200

# If the limit is exceeded:

HTTP/1.1 429 Too Many Requests

Retry-After: 30

Deep Dive: Multi-tier Rate Limiting

In real systems, several levels of limits are used:

By levels

  • Global: 10M req/day for the entire service
  • Per-user: 1000 req/hour per user
  • Per-endpoint: 100 req/min on /api/search
  • Per-IP: DDoS protection from one IP

Graceful Degradation

  • Soft limit: Warning in headers
  • Hard limit: 429 + Retry-After
  • Redis down: Fail-open (skip) or fail-close?
  • Monitoring: Alert for traffic anomalies

Key takeaways from the interview

  • 1
    Start with the requirements: What kind of traffic? Is inaccuracy acceptable? Distributed?
  • 2
    Select algorithm: Token Bucket for burst, Sliding Window Counter for accuracy.
  • 3
    Discuss trade-offs: Memory vs Accuracy, Latency vs Consistency.
  • 4
    Redis + Lua: Atomic operations solve race conditions.
  • 5
    Fail-open vs Fail-close: What to do if the rate limiter fails?

For additional practice and implementation comparison, review System Design Primer: Rate Limiter and Cloudflare Rate Limiting Rules.

Related chapters

  • System Design Primer (short summary) - provides foundational templates for requirements framing and constraint analysis in rate-limiter design.
  • System Design Interview - Alex Xu - contains the classic interview breakdown of rate limiting algorithms and practical trade-offs.
  • Acing the System Design Interview - adds alternative placement options for distributed rate limiting in real systems.
  • API Gateway - shows the primary production layer where rate limiting is typically enforced centrally.
  • API Security Patterns - connects rate limiting with broader API protection: auth, abuse control, and policy enforcement.
  • Caching Strategies - helps choose cache/storage approaches for limiter counters while keeping latency overhead low.
  • Resilience Patterns - expands fail-open/fail-close decisions and degradation behavior during limiter outages.
  • Notification System - demonstrates how multi-tier limits protect downstream systems from burst traffic.
  • Web Crawler - illustrates why controlled request pacing is required for safe and polite external crawling.
  • Payment System - adds a high-stakes context where limits protect critical financial flows.

Enable tracking in Settings