Live Streaming — System Design Space

Live streaming does not break on the player screen. It breaks where the signal must be ingested, transcoded into several qualities, and delivered to millions in seconds while the event is still happening.

The chapter ties together stream ingest, online transcoding into a quality ladder, packaging into segments with a manifest, and CDN delivery into one system held together by the latency budget.

For interviews and engineering discussions, this case is useful because it forces you to name the target latency mode, separate the video pipeline from the real-time chat channel, and explain how to survive the peak of concurrent viewers.

Latency Budget

Each critical-path hop needs a clear latency budget and predictable fallback behavior.

Fanout Strategy

Push/pull/hybrid fanout choices determine scalability, consistency, and complexity.

Session State

Model presence, reconnect, ordering, and delivery semantics explicitly.

Graceful Degradation

Under peaks, preserve core functionality while reducing non-critical quality.

Boundary

Video Hosting (VOD)

Neighboring case on upload and on-demand video: offline transcoding and delivery of finished files. This case is about the live, real-time path.

Читать обзор

Live streaming is not just "video people watch right away." It is a pipeline that must, in real time, ingest a signal, transcode it into several qualities, cut it into segments, and deliver it through a CDN with a fan-out from one source to millions of viewers. The central design axis here is not storage volume but the latency budget: how many seconds pass between the camera capturing a moment and the picture appearing on a viewer's screen.

It is worth separating this case from the neighboring Video Hosting study right away. That one is about upload and VOD (video on demand): a heavy write path, an offline transcoding queue, and serving the same asset far more cheaply over and over. Here the file is not sitting ready — it is being born right now, and every segment must be encoded and delivered while the event is still happening. That changes everything: transcoding becomes an online operation with a deadline, and latency turns into a product requirement.

How live differs from VOD

Source: in VOD the file is ready in advance; in live it is produced during the broadcast and has no "end."
Transcoding: VOD is offline, retryable, and tunable; live is online, with a hard deadline per segment.
Primary metric: VOD optimizes storage cost and startup time; live optimizes end-to-end latency and resilience to viewer spikes.
Seeking: VOD exposes the full duration; in live the viewer chases the live edge and is bounded by the DVR window.

Requirements

The functional list is short here — publish, watch, seek. What shapes the architecture is not that list but the non-functional constraints: latency, availability at peak, and how the player behaves when the viewer's link starts to degrade.

Functional

A streamer publishes a feed (camera/encoder → ingest).
A viewer watches the broadcast in several qualities.
Seek to the live edge and back within the DVR window.
Optional: chat, reactions, real-time viewer count.
Optional: record the broadcast to VOD after it ends.

Non-functional

Controlled latency for the target mode (seconds or sub-second).
High availability at the peak of concurrent viewers.
Network adaptivity: a trade-off between picture quality and smoothness when the link degrades.
Global delivery with an edge close to the viewer.

Pipeline: from capture to viewer

The baseline flow has four stages: signal ingest, online transcoding into a quality ladder, packaging into segments with a manifest, and delivery through a CDN. Every arrow on the diagram adds to the latency budget.

Ingest: how the signal enters the system

RTMP: the historical publishing standard from encoders; widely supported, but TCP-based and without loss protection.
SRT: a UDP-based transport with loss recovery — more robust on unstable contribution links.
WebRTC: publishing straight from the browser and the path to sub-second latency — but delivery no longer rides a plain CDN and needs a dedicated relay mesh.

Packaging: why segments

The stream is cut into short segment files (typically 2–6 seconds) plus a manifest playlist.
Segments are plain HTTP objects, so they are served and cached by a standard CDN.
The manifest describes available qualities and the segment list; the client decides which quality to pull.
Segment boundaries must align with keyframes, otherwise switching quality tears the picture.

Delivery protocols: HLS and MPEG-DASH

On the client the broadcast is delivered not as one continuous stream but as a manifest plus a chain of segments over plain HTTP. The two dominant formats are HLS (Apple) and MPEG-DASH (an ISO/MPEG standard, made practical by the DASH Industry Forum). Both are built on segments and a manifest, and both rely on a CDN as the scalable delivery layer.

HLS

Manifest is an .m3u8 playlist; segments in TS or fMP4 (CMAF).
The baseline format for Apple devices and Safari; works almost everywhere via hls.js.
Formalized in RFC 8216 and extended by Apple specifications (including Low-Latency HLS).

MPEG-DASH

Manifest is an .mpd (XML); segments in fMP4/CMAF.
Codec- and vendor-agnostic; runs in the browser via Media Source Extensions and dash.js.
Interop guidelines and the low-latency profile are driven by the DASH Industry Forum.

ABR on the client

Adaptive Bitrate (ABR) means the player measures link bandwidth and buffer level, then picks which quality from the manifest to request for the next segment. The adaptation logic lives on the client; the server only publishes the available variants. That is exactly why segmentation matters so much — at every segment boundary the client can safely switch up or down the quality ladder.

Latency budget: three modes

The central axis of live streaming is latency versus scale. The lower the latency, the harder and more expensive it is to deliver the feed to millions: shorter segments, a smaller buffer, tighter demands on the encoder and the network. So the first move is to name the target mode — sports and auctions need sub-second, a regular broadcast tolerates a few seconds, and those are different systems, not one system tuned differently.

Mode	Typical latency	Trade-off
Regular HLS / DASH	≈ 10–30 s (estimate)	Long segments and a large buffer give maximum scale and resilience but lag behind real time.
Low-Latency HLS / LL-DASH	≈ 2–5 s (estimate)	Partial segments, preload hints, and blocking playlist reload cut latency while staying over HTTP/CDN.
WebRTC	sub-second (estimate)	Direct peer delivery over UDP gives minimal latency, but scaling to millions needs a dedicated SFU/relay mesh and costs more.

These numbers are order-of-magnitude references from engineering practice and specifications (Apple Low-Latency HLS, DASH-IF), not guaranteed values for any specific system; real latency depends on segment length, buffer size, and the network. If segment duration, manifest caching policy, or encoder speed changes, the whole latency budget needs to be recalculated.

Delivery scale: CDN, edge, and fan-out

One streamer produces a feed that millions watch. This is the classic fan-out 1→N, and what sustains it is not the origin but a CDN with multi-tier caching. Segments and the manifest are plain HTTP objects, so most requests are served from the edge cache without reaching the source.

Multi-tier caching

Edge nodes near the viewer hold hot segments; a mid-tier (origin shield) absorbs requests to the source.
Segments are immutable, so they cache for a long time; the manifest changes on every new segment and caches briefly.
Careful TTLs are critical: too long on the manifest raises latency, too short hammers the origin at peak.

Peak concurrent viewers

Load is not spread out: a match final or a premiere produces a sharp spike the edge must be warmed for in advance.
Thundering herd on the manifest: thousands of players reload the playlist in lockstep — you need cache-stampede protection.
More on CDN tiers and origin shield is in the neighboring CDN study.

Transcoding: the bitrate ladder and its cost

From a single incoming feed the system builds a quality ladder (ABR ladder) — a set of variants from low to high resolution and bitrate. Unlike VOD, where this transcoding runs offline, in live it must encode in real time, with a deadline per segment.

Ladder and alignment

Several variants (e.g. 1080p / 720p / 480p / 360p) are encoded in parallel from one source.
Keyframes (IDR) must sit at the same points across all variants, and segment boundaries must match.
Without alignment the client cannot switch between qualities without artifacts — the exact pitfall the Twitch engineering blog dissects.

Cost and hardware encoding

Real-time transcoding is the most expensive component per active channel in CPU/GPU terms.
GPUs and hardware encoders give density and predictable latency but compress worse than slow software.
One decoder per source and a shared pipeline instead of N independent processes save resources — exactly what Twitch concluded versus a set of independent FFmpeg instances.

Deep dives

Live → VOD recording and the DVR window

The same segments served live can be stored and assembled into a VOD asset after the broadcast — a bridge to the Video Hosting case.
The DVR window is how much past broadcast is available to rewind; it is set by playlist length and segment retention.
The viewer chases the live edge; going back into DVR grows the buffer but loses real time.

Content protection

Signed URLs and tokens scope access to the manifest and segments by time and user.
DRM (FairPlay for HLS, Widevine/PlayReady for DASH) encrypts segments and manages keys.
Protection against stream re-capture (geo-blocking, key rotation) is its own product surface.

Chat and reactions as a separate real-time channel

Chat, reactions, and the viewer count are not part of the video pipeline. They are a separate real-time channel with their own fan-out, usually over WebSocket and a message bus, with their own latency and delivery semantics. On a large broadcast their scale (messages per second) can exceed the video load in event count. That fan-out, acknowledgement, and retry flow is closer to the Distributed Message Queue study than to video packaging.

Trade-offs and common mistakes

Chasing sub-second everywhere: defaulting to WebRTC where LL-HLS would do sharply raises scale and cost.
Ignoring segment alignment: misaligned keyframes break quality switching under ABR.
Treating live like VOD: assuming offline transcoding and forgetting the per-segment deadline.
Underestimating the peak: not warming the edge and not protecting the manifest from lockstep reloads at event start.
Mixing the channels: routing chat through the same path as video instead of a separate real-time channel.

What to make explicit in interviews

What the target latency budget is and why (sports vs regular broadcast vs interactive).
Where the boundary sits between the video pipeline and the separate real-time chat/reactions channel.
How delivery works at peak: CDN tiers, edge, manifest protection against the thundering herd.
What happens after the broadcast: VOD recording, the DVR window, token and DRM protection.

References

Apple — HTTP Live Streaming: official resources, specifications, and Low-Latency HLS (Apple Inc.)RFC 8216 — HTTP Live Streaming: playlist and segment format (IETF, 2017)DASH Industry Forum — MPEG-DASH interop guidelines and low-latency profile (DASH-IF)Twitch Engineering — Live Video Transmuxing/Transcoding: ABR variants and segment alignment (Twitch, 2017)MDN — Live streaming web audio and video: HLS, MPEG-DASH, RTMP, RTSP (Mozilla)

Related chapters

Video Hosting - Neighboring case on upload and VOD: offline transcoding, source storage, and on-demand delivery of finished files.
CDN - Multi-tier caching, edge, and origin shield. Without that layer, fanning segments out to millions hits the origin on the very first spike.
Real-Time Gaming - Another real-time class with a hard latency budget, where the cost of sub-second delivery and feedback is visible.
Distributed Message Queue - Chat and reactions are not part of the video pipeline but a separate real-time channel: fan-out, acknowledgements, retries. This shows what delivering them costs.