Chat System — System Design Space

Chat gets hard not when sending one message, but when the system must hold millions of long-lived connections, preserve message ordering, and synchronize multiple devices after reconnects.

The case ties together WebSocket gateways, server-to-server routing, durable history, presence, offline delivery, and push notifications into one working architecture.

For interviews and architecture reviews, it is useful because it forces explicit decisions about what must arrive instantly, what must preserve order, and what can be recovered later.

Latency Budget

Every hop from the WebSocket edge to the recipient needs a clear budget, or the chat stops feeling instant.

Session State

You need to know which server owns the active connection and when the system should treat a user as offline.

Offline Delivery

History storage, push notifications, and reconnect sync deserve their own design rather than being collapsed into one path.

Group Fan-out

As groups grow, message writes and message delivery should separate so the cost of fan-out stays under control.

Sending one message is easy. The difficulty starts when the system has to hold millions of long-lived connections, stay within a tight latency budget, synchronize multiple devices, and recover message history after reconnects. In an interview this case tests not knowledge of one protocol, but whether you can connect real-time delivery, durable storage, offline handling, and push channels into one coherent architecture — and name where each trade-off lives.

Related chapter

Alex Xu book review

A detailed chat-system breakdown appears in chapter 12 of Alex Xu's book.

Читать обзор

Examples of real systems

Slack

Discord

1Functional Requirements

1-on-1 chats between users.

Group chats with an upper bound on participants.

Text and media message delivery.

Online status and typing indicators.

Read receipts and message-history sync across devices.

Push notifications for users outside an active session.

2Non-Functional Requirements

Latency: < 100 ms

Online users should receive messages almost instantly.

Availability: 99.99% uptime

The messenger must stay available even under partial failures.

Consistency: delivery and ordering

Messages should not be lost or reordered within the same chat.

Scalability: 50M concurrent connections

The architecture must grow horizontally with the audience.

Example system scale

DAU:500M

Messages per day:100B

Concurrent connections:50M

Average message size:100 bytes

3Choosing the Communication Protocol

Related chapter

WebSocket Protocol

A deeper look at WebSocket handshake, keepalive, reconnect behavior, and production guidance.

Читать обзор

Comparing the options

Approach	Latency	Server load	Best fit
HTTP Polling	High	Very high	Legacy fallback
Long Polling	Medium	High	Simple notifications
WebSocket ✓	Minimal	Optimal	Real-time chat and collaboration
Server-Sent Events	Minimal	Medium	One-way notifications

Why WebSocket usually wins

✓Bidirectional channel: both client and server can send events whenever they need to.
✓One connection per session: the system does not pay for a new HTTP handshake on every message.
✓Lower protocol overhead: less extra traffic and less pressure on the server tier.
✓Cleaner delivery model: online delivery, acknowledgements, and reconnect handling can all sit behind the same transport.

4High-Level Architecture

In a production design, you need to explain how the session registry maps each user to a specific WebSocket server and how presence tells the system whether it can deliver immediately to an active session or should switch to the offline path.

Chat System: Architecture Map

connection routing, message storage, and offline delivery

Realtime Connection Layer

Client Apps

web and mobile

WebSocket Gateway

long-lived sockets

Chat Router

routing and acknowledgements

Session Registry

user -> server mapping

Presence Service

status and heartbeats

Storage and Offline Delivery

Message Store

Cassandra / Scylla

Delivery Queue

async retries

Push Service

provider fan-out

APNS / FCM

mobile push

Realtime Connection Layer

Clients -> Gateway -> Router

primary online-delivery path

Registry + Presence Service

routing and online status

Storage and Offline Delivery

Store -> Delivery Queue

durable history and async processing

Push Service -> APNS / FCM

notifications for offline users

Reference chat-system layout: long-lived connections, message routing, durable history, and a separate offline delivery path.

Online delivery path

User A sends a message through a WebSocket connection.
The WebSocket gateway forwards the event to the chat-routing layer.
The session registry tells the system which server currently owns user B's session.
If that session is active, the message is pushed directly to the recipient.

Offline delivery path

The message is acknowledged and persisted in durable storage.
The delivery queue creates a push task and retry schedule.
The push service hands the event to APNS or FCM.
After reconnect, the client pulls everything after its last confirmed sync point.

5Message Storage

Data storage

Database Internals

The choice between SQL and NoSQL depends on write patterns, reads, and pagination.

Читать обзор

Comparing storage options

Database	Pros	Cons	Best fit
PostgreSQL	Transactions and a familiar stack	Harder to scale out cleanly	Smaller deployments
Cassandra ✓	Scales horizontally and handles writes well	Requires careful work around eventual consistency	Large chat platforms and messengers
HBase	Wide-column model and Hadoop integration	Higher operational complexity	Very large analytical platforms

Data schema in Cassandra

-- Message table (partitioned by chat_id)
CREATE TABLE messages (
    chat_id       UUID,
    message_id    TIMEUUID,  -- Snowflake ID or TIMEUUID
    sender_id     UUID,
    content       TEXT,
    created_at    TIMESTAMP,
    PRIMARY KEY ((chat_id), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

-- Fast access to recent chat messages
SELECT * FROM messages
WHERE chat_id = ?
ORDER BY message_id DESC
LIMIT 50;

Why `message_id` matters so much

•Ordering: TIMEUUID or Snowflake ID help reconstruct one clear sequence of events.
•Pagination: it becomes easy to ask for messages before or after a known point.
•Idempotency: retries and duplicate delivery attempts do not create extra rows.
•Sync: devices can request everything after the last known message.

6Presence and Online Status

Activity checks

A client typically sends a heartbeat every 5-30 seconds. If the system stops receiving it, the user is treated as offline and the design no longer relies on immediate delivery to an open session.

// Redis stores the last activity timestamp
SET user:{user_id}:last_active {timestamp}
EXPIRE user:{user_id}:last_active 30

// Check online status
GET user:{user_id}:last_active
// If the key exists, the user is considered online

Group fan-out pressure

Even a simple status change can explode into a large fan-out problem when a user has hundreds of contacts or participates in many groups.

•Load presence lazily when a relevant conversation is actually opened.
•Batch updates instead of pushing every tiny change immediately.
•Push only to active or high-priority conversations.

7Group Chats

Groups break the naive “one message, one recipient” model. As membership grows, the system increasingly needs to separate the write step from delivery and move toward a more event-driven distribution model.

How the architecture changes as groups grow

Small

up to 100 members

Direct delivery over WebSocket still stays simple and manageable.

Medium

100-10K members

It is safer to separate message writes from delivery and hand distribution to background workers.

Very large

channels and communities 10K+

You want a channel-subscription model rather than a personal push to every member.

Data schema for groups

-- Groups
CREATE TABLE groups (
    group_id UUID PRIMARY KEY,
    name TEXT,
    created_by UUID,
    created_at TIMESTAMP
);

-- Group members (for fast lookup)
CREATE TABLE group_members (
    group_id UUID,
    user_id UUID,
    joined_at TIMESTAMP,
    role TEXT, -- admin, member
    PRIMARY KEY ((group_id), user_id)
);

-- User groups (reverse index)
CREATE TABLE user_groups (
    user_id UUID,
    group_id UUID,
    last_read TIMEUUID, -- for unread counts
    PRIMARY KEY ((user_id), group_id)
);

8Synchronization and Offline Delivery

Synchronization

Last-seen message ID

The last-seen message ID pattern makes cross-device history sync much cheaper.

Читать обзор

Sync protocol

Each device stores `last_synced_message_id`. On reconnect, the client sends that checkpoint and the server returns everything that came after it.

1The client sends its `last_synced_message_id`.
2The server returns all messages after that identifier.
3The client applies the delta and updates its sync point.

Offline queue

For offline users, it helps to separate durable history storage from a dedicated queue that handles delayed delivery and retries.

-- Queue of unread messages
CREATE TABLE offline_messages (
    user_id UUID,
    message_id TIMEUUID,
    chat_id UUID,
    sender_id UUID,
    content TEXT,
    PRIMARY KEY ((user_id), message_id)
) WITH default_time_to_live = 2592000; -- 30 days TTL

-- When a user reconnects
SELECT * FROM offline_messages WHERE user_id = ?;
-- Delivered rows are deleted after synchronization

9Scaling WebSocket Servers

Main challenge

WebSocket connections are stateful. You cannot simply add more servers behind a load balancer without knowing which node currently owns a user's active connection.

Session registry

A centralized Redis table stores the mapping user → server so routing does not depend on whichever node the balancer happened to pick.

// When a user connects
HSET user_sessions user_123 server_5

// When sending a message
target_server = HGET user_sessions user_456

// When a user disconnects
HDEL user_sessions user_123

Server-to-server delivery

Redis Pub/Sub or Kafka can carry inter-server events so the target server consumes the event from its own channel and delivers it locally.

// Server 1 publishes a message
PUBLISH chat_server_5 {
  "type": "message",
  "to": "user_456",
  "content": "Hello!"
}

// Server 5 receives the event and delivers it
// over its local WebSocket connection

Sticky sessions as an alternative

A load balancer can try to keep a user on the same server through sticky sessions based on IP or cookies. That can work, but it makes failover and rebalancing harder, so a dedicated session registry is usually easier to reason about.

10Key Interview Points

What you should always cover

•Why WebSocket is needed here and where fallback options remain relevant.
•How messages are routed between servers and how the recipient's active session is found.
•How ordering and delivery guarantees are enforced.
•How offline sync and push notifications are handled.
•How the long-lived connection layer scales.

Good follow-up topics

•End-to-end encryption and Signal Protocol.
•Read receipts and typing indicators.
•Separating media storage from text storage, for example with S3 and a CDN.
•Rate limits and spam prevention.
•Synchronization across multiple devices.

Common interview mistakes

✗Forgetting that WebSocket connections are stateful and therefore do not scale “just like HTTP.”
✗Skipping the offline path, reconnect flow, and push notifications.
✗Ignoring message ordering once delivery becomes distributed across servers.
✗Not discussing how fan-out changes in large group chats or channel-like workloads.

References

Discord Engineering — How Discord Stores Trillions of Messages (Discord blog, 2023)Slack Engineering — Real-Time Messaging (real-time delivery architecture)High Scalability — How WhatsApp Grew to Nearly 500M Users (Rick Reed, Erlang Factory talk)I. Fette, A. Melnikov — RFC 6455: The WebSocket Protocol (IETF, 2011)

Related chapters

WebSocket protocol - deepens the transport layer behind real-time delivery and long-lived connection handling.
Notification System - shows how offline delivery and push channels complement chat when users are no longer connected.
Database Internals: A Deep Dive (short summary) - helps justify storage choices, message schema design, and history-read patterns.
Twitter/X - adds a neighboring large-scale case where fan-out and event-delivery trade-offs are especially visible.
System design case studies examples - helps compare Chat System with other case studies and see how architecture trade-offs shift across domains.
System Design Interview: An Insider's Guide (short summary) - contains the classic chat-system walkthrough with a clear step-by-step interview structure.
Hacking the System Design Interview (short summary) - adds interview communication patterns and chat-specific trade-off framing.
Leslie Lamport: causality, Paxos, and engineering mindset - strengthens the foundation for reasoning about ordering, causality, and consistent delivery.
Short-Term Preparation for System Design Interviews - helps package a chat-system answer into a concise and interview-ready structure.