Chat gets hard not when sending one message, but when the system must hold millions of long-lived connections, preserve message ordering, and synchronize multiple devices after reconnects.
The case ties together WebSocket gateways, server-to-server routing, durable history, presence, offline delivery, and push notifications into one working architecture.
For interviews and architecture reviews, it is useful because it forces explicit decisions about what must arrive instantly, what must preserve order, and what can be recovered later.
Latency Budget
Every hop from the WebSocket edge to the recipient needs a clear budget, or the chat stops feeling instant.
Session State
You need to know which server owns the active connection and when the system should treat a user as offline.
Offline Delivery
History storage, push notifications, and reconnect sync deserve their own design rather than being collapsed into one path.
Group Fan-out
As groups grow, message writes and message delivery should separate so the cost of fan-out stays under control.
A chat system becomes difficult not when sending one message, but when it has to hold millions of long-lived connections, stay within a tight latency budget, synchronize multiple devices, and recover message history after reconnects. That is why this case is a classic system-design interview question: it tests not only WebSocket knowledge, but also whether you can connect real-time delivery, durable storage, offline handling, and push channels into one coherent architecture.
Related chapter
Alex Xu book review
A detailed chat-system breakdown appears in chapter 12 of Alex Xu's book.
Examples of real systems
1Functional Requirements
1-on-1 chats between users.
Group chats with an upper bound on participants.
Text and media message delivery.
Online status and typing indicators.
Read receipts and message-history sync across devices.
Push notifications for users outside an active session.
2Non-Functional Requirements
Latency: < 100 ms
Online users should receive messages almost instantly.
Availability: 99.99% uptime
The messenger must stay available even under partial failures.
Consistency: delivery and ordering
Messages should not be lost or reordered within the same chat.
Scalability: 50M concurrent connections
The architecture must grow horizontally with the audience.
Example system scale
3Choosing the Communication Protocol
Related chapter
WebSocket Protocol
A deeper look at WebSocket handshake, keepalive, reconnect behavior, and production guidance.
Comparing the options
| Approach | Latency | Server load | Best fit |
|---|---|---|---|
| HTTP Polling | High | Very high | Legacy fallback |
| Long Polling | Medium | High | Simple notifications |
| WebSocket ✓ | Minimal | Optimal | Real-time chat and collaboration |
| Server-Sent Events | Minimal | Medium | One-way notifications |
Why WebSocket usually wins
- ✓Bidirectional channel: both client and server can send events whenever they need to.
- ✓One connection per session: the system does not pay for a new HTTP handshake on every message.
- ✓Lower protocol overhead: less extra traffic and less pressure on the server tier.
- ✓Cleaner delivery model: online delivery, acknowledgements, and reconnect handling can all sit behind the same transport.
4High-Level Architecture
In a production design, you need to explain how the session registry maps each user to a specific WebSocket server and how presence tells the system whether it can deliver immediately to an active session or should switch to the offline path.
Chat System: Architecture Map
connection routing, message storage, and offline deliveryRealtime Connection Layer
Storage and Offline Delivery
Reference chat-system layout: long-lived connections, message routing, durable history, and a separate offline delivery path.
Online delivery path
- User A sends a message through a WebSocket connection.
- The WebSocket gateway forwards the event to the chat-routing layer.
- The session registry tells the system which server currently owns user B's session.
- If that session is active, the message is pushed directly to the recipient.
Offline delivery path
- The message is acknowledged and persisted in durable storage.
- The delivery queue creates a push task and retry schedule.
- The push service hands the event to APNS or FCM.
- After reconnect, the client pulls everything after its last confirmed sync point.
5Message Storage
Data storage
Database Internals
The choice between SQL and NoSQL depends on write patterns, reads, and pagination.
Comparing storage options
| Database | Pros | Cons | Best fit |
|---|---|---|---|
| PostgreSQL | Transactions and a familiar stack | Harder to scale out cleanly | Smaller deployments |
| Cassandra ✓ | Scales horizontally and handles writes well | Requires careful work around eventual consistency | Large chat platforms and messengers |
| HBase | Wide-column model and Hadoop integration | Higher operational complexity | Very large analytical platforms |
Data schema in Cassandra
-- Message table (partitioned by chat_id)
CREATE TABLE messages (
chat_id UUID,
message_id TIMEUUID, -- Snowflake ID or TIMEUUID
sender_id UUID,
content TEXT,
created_at TIMESTAMP,
PRIMARY KEY ((chat_id), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
-- Fast access to recent chat messages
SELECT * FROM messages
WHERE chat_id = ?
ORDER BY message_id DESC
LIMIT 50;Why `message_id` matters so much
- •Ordering: TIMEUUID or Snowflake ID help reconstruct one clear sequence of events.
- •Pagination: it becomes easy to ask for messages before or after a known point.
- •Idempotency: retries and duplicate delivery attempts do not create extra rows.
- •Sync: devices can request everything after the last known message.
6Presence and Online Status
Activity checks
A client typically sends a heartbeat every 5-30 seconds. If the system stops receiving it, the user is treated as offline and the design no longer relies on immediate delivery to an open session.
// Redis stores the last activity timestamp
SET user:{user_id}:last_active {timestamp}
EXPIRE user:{user_id}:last_active 30
// Check online status
GET user:{user_id}:last_active
// If the key exists, the user is considered onlineGroup fan-out pressure
Even a simple status change can explode into a large fan-out problem when a user has hundreds of contacts or participates in many groups.
- •Load presence lazily when a relevant conversation is actually opened.
- •Batch updates instead of pushing every tiny change immediately.
- •Push only to active or high-priority conversations.
7Group Chats
Groups break the naive “one message, one recipient” model. As membership grows, the system increasingly needs to separate the write step from delivery and move toward a more event-driven distribution model.
How the architecture changes as groups grow
Direct delivery over WebSocket still stays simple and manageable.
It is safer to separate message writes from delivery and hand distribution to background workers.
You want a channel-subscription model rather than a personal push to every member.
Data schema for groups
-- Groups
CREATE TABLE groups (
group_id UUID PRIMARY KEY,
name TEXT,
created_by UUID,
created_at TIMESTAMP
);
-- Group members (for fast lookup)
CREATE TABLE group_members (
group_id UUID,
user_id UUID,
joined_at TIMESTAMP,
role TEXT, -- admin, member
PRIMARY KEY ((group_id), user_id)
);
-- User groups (reverse index)
CREATE TABLE user_groups (
user_id UUID,
group_id UUID,
last_read TIMEUUID, -- for unread counts
PRIMARY KEY ((user_id), group_id)
);8Synchronization and Offline Delivery
Synchronization
Last-seen message ID
The last-seen message ID pattern makes cross-device history sync much cheaper.
Sync protocol
Each device stores `last_synced_message_id`. On reconnect, the client sends that checkpoint and the server returns everything that came after it.
- 1The client sends its `last_synced_message_id`.
- 2The server returns all messages after that identifier.
- 3The client applies the delta and updates its sync point.
Offline queue
For offline users, it helps to separate durable history storage from a dedicated queue that handles delayed delivery and retries.
-- Queue of unread messages
CREATE TABLE offline_messages (
user_id UUID,
message_id TIMEUUID,
chat_id UUID,
sender_id UUID,
content TEXT,
PRIMARY KEY ((user_id), message_id)
) WITH default_time_to_live = 2592000; -- 30 days TTL
-- When a user reconnects
SELECT * FROM offline_messages WHERE user_id = ?;
-- Delivered rows are deleted after synchronization9Scaling WebSocket Servers
Main challenge
WebSocket connections are stateful. You cannot simply add more servers behind a load balancer without knowing which node currently owns a user's active connection.
Session registry
A centralized Redis table stores the mapping user → server so routing does not depend on whichever node the balancer happened to pick.
// When a user connects
HSET user_sessions user_123 server_5
// When sending a message
target_server = HGET user_sessions user_456
// When a user disconnects
HDEL user_sessions user_123Server-to-server delivery
Redis Pub/Sub or Kafka can carry inter-server events so the target server consumes the event from its own channel and delivers it locally.
// Server 1 publishes a message
PUBLISH chat_server_5 {
"type": "message",
"to": "user_456",
"content": "Hello!"
}
// Server 5 receives the event and delivers it
// over its local WebSocket connectionSticky sessions as an alternative
A load balancer can try to keep a user on the same server through sticky sessions based on IP or cookies. That can work, but it makes failover and rebalancing harder, so a dedicated session registry is usually easier to reason about.
10Key Interview Points
What you should always cover
- •Why WebSocket is needed here and where fallback options remain relevant.
- •How messages are routed between servers and how the recipient's active session is found.
- •How ordering and delivery guarantees are enforced.
- •How offline sync and push notifications are handled.
- •How the long-lived connection layer scales.
Good follow-up topics
- •End-to-end encryption and Signal Protocol.
- •Read receipts and typing indicators.
- •Separating media storage from text storage, for example with S3 and a CDN.
- •Rate limits and spam prevention.
- •Synchronization across multiple devices.
Common interview mistakes
- ✗Forgetting that WebSocket connections are stateful and therefore do not scale “just like HTTP.”
- ✗Skipping the offline path, reconnect flow, and push notifications.
- ✗Ignoring message ordering once delivery becomes distributed across servers.
- ✗Not discussing how fan-out changes in large group chats or channel-like workloads.
Related chapters
- WebSocket protocol - deepens the transport layer behind real-time delivery and long-lived connection handling.
- Notification System - shows how offline delivery and push channels complement chat when users are no longer connected.
- Database Internals: A Deep Dive (short summary) - helps justify storage choices, message schema design, and history-read patterns.
- Twitter/X - adds a neighboring large-scale case where fan-out and event-delivery trade-offs are especially visible.
- System design case studies examples - helps compare Chat System with other case studies and see how architecture trade-offs shift across domains.
- System Design Interview: An Insider's Guide (short summary) - contains the classic chat-system walkthrough with a clear step-by-step interview structure.
- Hacking the System Design Interview (short summary) - adds interview communication patterns and chat-specific trade-off framing.
- Leslie Lamport: causality, Paxos, and engineering mindset - strengthens the foundation for reasoning about ordering, causality, and consistent delivery.
- Short-Term Preparation for System Design Interviews - helps package a chat-system answer into a concise and interview-ready structure.
