Uber/Lyft — System Design Space

An Uber-like service does not break on the map. It breaks when moving demand has to be matched with moving supply in a fraction of a second.

This chapter ties together the location stream, dispatch, ETA computation, trip lifecycle, surge pricing, and post-trip settlement into one architecture.

For interviews and engineering discussions, this case is useful because it quickly shows whether you can design under a hard latency budget without losing correctness across trip states.

Location Stream

Driver coordinates arrive continuously, so ingest, TTL, and geo-index freshness should be designed as their own critical path.

Dispatch

The driver-assignment path has to find candidates quickly and stay stable through timeouts, declines, and retries.

ETA & Routing

ETA depends not only on the map, but also on traffic, route recomputation, and timely delivery of updates into client apps.

Peak Demand

Demand and supply shift by zone and by minute, which is why surge logic cannot be treated as a secondary product detail.

Uber/Lyft is best framed as a ride-hailing system where latency shows up directly in the product experience: if driver assignment takes too long, the user simply leaves. That is why the problem is not about drawing cars on a map. The hard part is tying together the location stream, driver assignment, pricing, and trip lifecycle into one system that does not fall apart under real-time load.

Source

System Design Interview

Alex Xu covers proximity services, geo indexes, and moving-location pipelines in a practical interview style.

Читать обзор

Product scale

5M+ trips per day

100M+ riders

10K+ cities

5M+ drivers

Uber

Uber Engineering Blog

Engineering write-ups on Uber internals, from geo streams to routing and pricing.

Перейти на сайт

1Functional Requirements

Passenger flow

•Enter pickup, destination, and ride type
•See price and ETA before confirming the request
•Get an assigned driver and a live map of arrival progress
•Track the trip from pickup through completion
•Pay for the trip and leave feedback afterwards

Driver flow

•Go online or offline and keep publishing location updates
•Receive trip offers and accept or reject them
•Navigate to pickup and then to destination
•Report trip status changes and complete the ride
•Receive payment and review completed trip history

2Non-functional requirements

Low latency

Driver assignment should stay within roughly a minute, and the live map cannot lag by tens of seconds.

High availability

People depend on the service while already on the move, so ride requests, navigation, and settlement should survive local failures.

Elasticity

Weather, concerts, and rush hour create sharp demand spikes, so the system must scale under strong geographic and temporal skew.

Load estimate

Active drivers:1M

Location updates/sec:~300K

New ride requests/sec:~1K

Update frequency:3-4 sec

3Real-time location stream

⚠️ Main difficulty

A million drivers sending fresh coordinates every 3-4 seconds quickly turns into hundreds of thousands of writes per second. A regular transactional database is not enough for that path.

The location path usually stays in memory and is protected by a short TTL window: when the driver app stops sending updates, the system removes that driver from the pool of candidates automatically.

Driver location service

•Ingests the coordinate stream from driver apps
•Keeps current position in memory, for example in Redis
•Marks drivers offline when updates stop arriving
•Publishes the stream into Kafka for history, analytics, and downstream services

Spatial index

Nearby-driver lookup is usually built on top of Redis GEO commands, Geohash, or H3.

# Redis GEO commands
GEOADD drivers {lng} {lat} driver_123
GEORADIUS drivers {lng} {lat} 5 km
  WITHDIST WITHCOORD COUNT 10

# Or Geohash approach
SET driver:123:location "9q8yy"
SMEMBERS geohash:9q8yy:drivers

Wikipedia

Geohash

A way to encode coordinates into strings for spatial indexing.

Перейти на сайт

Location processing path

1Driver apps send coordinates over WebSocket or UDP
2The gateway accepts the stream and smooths write spikes with batching
3Redis Cluster stores current coordinates and the geo index of available drivers
4Kafka receives the stream for history, analytics, and demand estimation

4Matching passengers and drivers

The critical path here is dispatch: the system has to find nearby candidates fast and then rank them. The closest driver is not always the best pick — what also matters is how likely that driver is to accept the trip and reach pickup quickly.

How assignment works

1
The passenger creates a ride request
Pickup, destination, and ride type go through the API Gateway.
2
The dispatch service gathers candidates
The geo index returns nearby free drivers with the required vehicle type.
3
Candidates are ranked
ETA to pickup, rating, acceptance history, and travel direction all matter.
4
The best candidate receives an offer
The push or socket offer lives only for a limited window, usually 15-30 seconds.
5
If it is rejected, the system tries the next one
That keeps assignment fast even when accept rates drop.

Sequential assignment

One best driver gets the offer, and the system waits for a response before the next attempt.

+ Easier to explain and implement
+ Less competition between drivers
- Slower under timeouts and declines

Batch broadcast

Several candidates receive the offer at once, and the first acceptance wins.

+ Lowers time to assignment
- Increases competition between drivers
- Harder to keep the process fair

Supply positioning

To smooth imbalance before the next ride request arrives, the system nudges drivers toward zones where demand is building up.

•Demand prediction: models use time of day, weather, and city events
•Current supply: the platform tracks the live distribution of active drivers by zone
•Incentives: bonuses and surge help pull drivers toward deficit areas

5ETA and routing

Optimization

ETA caching

Popular zone pairs and frequently repeated routes can be cached similarly to CDN thinking.

Читать обзор

ETA matters almost as much as price. If pickup and trip-time predictions are consistently wrong, users stop trusting the product even when assignment itself is fast.

Which ETAs the system needs

Pickup ETA

Time from the current driver position to the pickup point. This is what the passenger sees before confirming the request and after driver assignment.

Trip ETA

Time from pickup to destination. It affects the fare estimate, arrival promise, and the quality of the in-trip experience.

What drives the estimate

Road graph

Speed limits
One-way streets
Turn and access restrictions

Live traffic

GPS data from active drivers
Congestion and incidents
Temporary route constraints

Historical data

Day of week and hour of day
Seasonality and repeating patterns
Systematic forecasting errors from the past

Routing engine

Inside an Uber-like system, a dedicated routing engine decides which path to use as the baseline and when to correct it on the fly.

•Contraction Hierarchies: speed up queries over a large road graph
•A*: useful for local route search with current traffic conditions
•ML correction: fixes systematic prediction errors using historical outcomes

6Surge pricing

Uber Engineering

H3: Hexagonal Hierarchical Index

Uber's open-source library for hexagonal geospatial indexing.

Перейти на сайт

Surge pricing is not just a revenue feature. It is an operational tool for restoring balance between demand and supply when free drivers disappear too quickly.

Why the multiplier exists

Demand side

Higher prices filter out less urgent trips and free capacity for riders who truly need the ride right now.

Supply side

Higher prices attract more drivers into deficit zones and help supply recover faster.

Example multiplier formula

# Simplified formula
surge_multiplier = demand / supply

# Where:
# demand = ride_requests in the zone during the last N minutes
# supply = available_drivers in the zone

# Example zones (Geohash or H3 cells)
zone_9q8yy:
  demand: 50 requests/5min
  supply: 10 drivers
  surge: 5.0x → cap at 3.0x

# Surge is applied to base fare
final_price = base_fare * surge_multiplier

Pricing zones

•The city is divided into H3 cells or similar geo zones
•Each zone gets its own multiplier
•Recalculation runs every 1-2 minutes
•Transitions between neighboring zones are smoothed to avoid chaotic jumps

Price lock

•The passenger sees the price before confirming the request
•The quote stays fixed for a short window, for example 5-10 minutes
•The system protects the user from a sudden price jump after consent
•Quote ID is retained for auditing and retries

7Trip lifecycle

The trip lifecycle is easiest to reason about as a finite state machine: the order has a small number of allowed states, and transitions between them should stay explicit, predictable, and idempotent.

Trip state machine

Main trip path

Every transition should be explicit, idempotent, and visible to both sides of the order.

1 / 7

Current step

1. Ride requested

The rider enters pickup, destination, and ride type. The system stores the quote and starts looking for a driver.

What to control

Service-area validation and pickup sanity checks
Rate limiting and anti-fraud checks
Quote ID and TTL for the fare offer

Next transition

2. Driver assigned

If the current step finishes successfully, the ride moves into the next state.

Cancellation branch

The rider or driver can cancel before the trip starts.

Cancellation is available at ride requested, driver assigned and driver en route.

Status updates in flight

While the order is active, the client needs almost live visibility into fresh coordinates and state transitions.

•WebSocket: the main channel for live map and status updates
•Push notifications: key transitions such as driver assignment or trip completion
•Fallback polling: clients degrade to periodic refresh when the socket path becomes unstable

Cancellation handling

•Cancelling before driver assignment is usually free
•Late passenger cancellation can lead to a pickup fee
•Driver cancellation affects assignment metrics and reliability scores
•No-show paths require refunds and explicit event auditing

8High-level architecture and scenarios

Architecture & Scenario Explorer

Request paths through a ride-hailing backend

Request and Dispatch Plane

Rider App

ride request

API Gateway

auth and routing

Driver App

online status

Location Service

location stream ingest

Dispatch Service

driver assignment

Trip Service

trip state machine

Routing, Pricing, and Settlement Plane

Routing / ETA

route, traffic, estimate

Pricing Service

fare and surge

Payment Service

charge and payout

Notifications

push, SMS, receipt

Request and Dispatch Plane

Request -> dispatch

rider, gateway, matching, and trip state

Drivers -> location

online status and live geo index

Routing, Pricing, and Settlement Plane

Routing -> pricing

ETA, traffic, fare, and surge

Payment -> notifications

charge, payout, receipt, and status

The architecture separates the fast location-and-assignment path from the stricter routing, pricing, and settlement path.

Choose a scenario above to highlight a concrete request path through the Uber/Lyft architecture.

A durable ride-hailing backend separates the fast location and assignment path from the stricter pricing, payment, and notification path. That is how you keep money and state changes consistent even while the coordinate stream goes through a local spike.

Core data stores

Redis Cluster

Driver geo index, zone multipliers, active sessions, and other short-lived state

PostgreSQL

Users, trips, payments, and other transactional entities

Cassandra

Location history, trip events, and long time-series workloads

9Key interview points

✓ Topics you should cover

•How to process a location stream with hundreds of thousands of writes per second
•How the geo index and nearby-candidate lookup work
•How to choose between sequential assignment and batch broadcast
•How ETA is computed from a road graph plus live traffic
•How surge restores balance between demand and supply by zone

💡 Good extensions

•Shared rides and extra route optimization
•Fraud prevention: fake drivers, GPS spoofing, and route anomalies
•Safety features such as trip sharing, SOS, and abnormal-route detection
•Scheduled rides and prediction of future demand
•Multimodal transport: scooters, bikes, and public transit integration

Common interview mistakes

✗Trying to store the live location stream in a regular SQL database without a dedicated geo path
✗Ignoring scale: hundreds of thousands of location updates every second
✗Ignoring live traffic and accumulated prediction error in ETA
✗Treating surge as a business ornament instead of part of supply architecture

In interviews, the key is not to list services one by one, but to explain the core trade-off clearly: the more aggressively you optimize assignment speed, the more carefully you need to handle fairness, cancellation paths, and cross-client state synchronization.

Related chapters

Airbnb - Useful for comparing two geo-heavy domains: candidate discovery, pricing, and order lifecycle in a two-sided marketplace.
System Design Interview: An Insider's Guide (short summary) - A good refresher on geo services, event pipelines, and scaling real-time systems.
Content Delivery Network (CDN) - Provides background on global delivery, regional points of presence, and latency reduction for mobile clients.
Google Maps / Proximity Service - geosearch - Shows how spatial indexes, nearby search, and geographic load skew work in practice.
Hacking the System Design Interview (short summary) - Useful for packaging the architecture and discussing trade-offs clearly in an interview.
System Design case studies overview - Places the Uber/Lyft case in the broader context of adjacent product domains and architecture choices.
Short-Term Preparation for System Design Interviews - Helps turn this architecture into a concise and convincing interview answer under time pressure.