Live streaming does not break on the player screen. It breaks where the signal must be ingested, transcoded into several qualities, and delivered to millions in seconds while the event is still happening.
The chapter ties together stream ingest, online transcoding into a quality ladder, packaging into segments with a manifest, and CDN delivery into one system held together by the latency budget.
For interviews and engineering discussions, this case is useful because it forces you to name the target latency mode, separate the video pipeline from the real-time chat channel, and explain how to survive the peak of concurrent viewers.
Latency Budget
Each critical-path hop needs a clear latency budget and predictable fallback behavior.
Fanout Strategy
Push/pull/hybrid fanout choices determine scalability, consistency, and complexity.
Session State
Model presence, reconnect, ordering, and delivery semantics explicitly.
Graceful Degradation
Under peaks, preserve core functionality while reducing non-critical quality.
Boundary
Video Hosting (VOD)
Neighboring case on upload and on-demand video: offline transcoding and delivery of finished files. This case is about the live, real-time path.
Live streaming is not just "video people watch right away." It is a pipeline that must, in real time, ingest a signal, transcode it into several qualities, cut it into segments, and deliver it through a CDN with a fan-out from one source to millions of viewers. The central design axis here is not storage volume but the latency budget: how many seconds pass between the camera capturing a moment and the picture appearing on a viewer's screen.
It is worth separating this case from the neighboring Video Hosting study right away. That one is about upload and VOD (video on demand): a heavy write path, an offline transcoding queue, and serving the same asset far more cheaply over and over. Here the file is not sitting ready — it is being born right now, and every segment must be encoded and delivered while the event is still happening. That changes everything: transcoding becomes an online operation with a deadline, and latency turns into a product requirement.
How live differs from VOD
- Source: in VOD the file is ready in advance; in live it is produced during the broadcast and has no "end."
- Transcoding: VOD is offline, retryable, and tunable; live is online, with a hard deadline per segment.
- Primary metric: VOD optimizes storage cost and startup time; live optimizes end-to-end latency and resilience to viewer spikes.
- Seeking: VOD exposes the full duration; in live the viewer chases the live edge and is bounded by the DVR window.
Requirements
The functional list is short here — publish, watch, seek. What shapes the architecture is not that list but the non-functional constraints: latency, availability at peak, and how the player behaves when the viewer's link starts to degrade.
Functional
- A streamer publishes a feed (camera/encoder → ingest).
- A viewer watches the broadcast in several qualities.
- Seek to the live edge and back within the DVR window.
- Optional: chat, reactions, real-time viewer count.
- Optional: record the broadcast to VOD after it ends.
Non-functional
- Controlled latency for the target mode (seconds or sub-second).
- High availability at the peak of concurrent viewers.
- Network adaptivity: a trade-off between picture quality and smoothness when the link degrades.
- Global delivery with an edge close to the viewer.
Pipeline: from capture to viewer
The baseline flow has four stages: signal ingest, online transcoding into a quality ladder, packaging into segments with a manifest, and delivery through a CDN. Every arrow on the diagram adds to the latency budget.
Ingest: how the signal enters the system
- RTMP: the historical publishing standard from encoders; widely supported, but TCP-based and without loss protection.
- SRT: a UDP-based transport with loss recovery — more robust on unstable contribution links.
- WebRTC: publishing straight from the browser and the path to sub-second latency — but delivery no longer rides a plain CDN and needs a dedicated relay mesh.
Packaging: why segments
- The stream is cut into short segment files (typically 2–6 seconds) plus a manifest playlist.
- Segments are plain HTTP objects, so they are served and cached by a standard CDN.
- The manifest describes available qualities and the segment list; the client decides which quality to pull.
- Segment boundaries must align with keyframes, otherwise switching quality tears the picture.
Delivery protocols: HLS and MPEG-DASH
On the client the broadcast is delivered not as one continuous stream but as a manifest plus a chain of segments over plain HTTP. The two dominant formats are HLS (Apple) and MPEG-DASH (an ISO/MPEG standard, made practical by the DASH Industry Forum). Both are built on segments and a manifest, and both rely on a CDN as the scalable delivery layer.
HLS
- Manifest is an
.m3u8playlist; segments in TS or fMP4 (CMAF). - The baseline format for Apple devices and Safari; works almost everywhere via hls.js.
- Formalized in RFC 8216 and extended by Apple specifications (including Low-Latency HLS).
MPEG-DASH
- Manifest is an
.mpd(XML); segments in fMP4/CMAF. - Codec- and vendor-agnostic; runs in the browser via Media Source Extensions and dash.js.
- Interop guidelines and the low-latency profile are driven by the DASH Industry Forum.
ABR on the client
Adaptive Bitrate (ABR) means the player measures link bandwidth and buffer level, then picks which quality from the manifest to request for the next segment. The adaptation logic lives on the client; the server only publishes the available variants. That is exactly why segmentation matters so much — at every segment boundary the client can safely switch up or down the quality ladder.
Latency budget: three modes
The central axis of live streaming is latency versus scale. The lower the latency, the harder and more expensive it is to deliver the feed to millions: shorter segments, a smaller buffer, tighter demands on the encoder and the network. So the first move is to name the target mode — sports and auctions need sub-second, a regular broadcast tolerates a few seconds, and those are different systems, not one system tuned differently.
| Mode | Typical latency | Trade-off |
|---|---|---|
| Regular HLS / DASH | ≈ 10–30 s (estimate) | Long segments and a large buffer give maximum scale and resilience but lag behind real time. |
| Low-Latency HLS / LL-DASH | ≈ 2–5 s (estimate) | Partial segments, preload hints, and blocking playlist reload cut latency while staying over HTTP/CDN. |
| WebRTC | sub-second (estimate) | Direct peer delivery over UDP gives minimal latency, but scaling to millions needs a dedicated SFU/relay mesh and costs more. |
These numbers are order-of-magnitude references from engineering practice and specifications (Apple Low-Latency HLS, DASH-IF), not guaranteed values for any specific system; real latency depends on segment length, buffer size, and the network. If segment duration, manifest caching policy, or encoder speed changes, the whole latency budget needs to be recalculated.
Delivery scale: CDN, edge, and fan-out
One streamer produces a feed that millions watch. This is the classic fan-out 1→N, and what sustains it is not the origin but a CDN with multi-tier caching. Segments and the manifest are plain HTTP objects, so most requests are served from the edge cache without reaching the source.
Multi-tier caching
- Edge nodes near the viewer hold hot segments; a mid-tier (origin shield) absorbs requests to the source.
- Segments are immutable, so they cache for a long time; the manifest changes on every new segment and caches briefly.
- Careful TTLs are critical: too long on the manifest raises latency, too short hammers the origin at peak.
Peak concurrent viewers
- Load is not spread out: a match final or a premiere produces a sharp spike the edge must be warmed for in advance.
- Thundering herd on the manifest: thousands of players reload the playlist in lockstep — you need cache-stampede protection.
- More on CDN tiers and origin shield is in the neighboring CDN study.
Transcoding: the bitrate ladder and its cost
From a single incoming feed the system builds a quality ladder (ABR ladder) — a set of variants from low to high resolution and bitrate. Unlike VOD, where this transcoding runs offline, in live it must encode in real time, with a deadline per segment.
Ladder and alignment
- Several variants (e.g. 1080p / 720p / 480p / 360p) are encoded in parallel from one source.
- Keyframes (IDR) must sit at the same points across all variants, and segment boundaries must match.
- Without alignment the client cannot switch between qualities without artifacts — the exact pitfall the Twitch engineering blog dissects.
Cost and hardware encoding
- Real-time transcoding is the most expensive component per active channel in CPU/GPU terms.
- GPUs and hardware encoders give density and predictable latency but compress worse than slow software.
- One decoder per source and a shared pipeline instead of N independent processes save resources — exactly what Twitch concluded versus a set of independent FFmpeg instances.
Deep dives
Live → VOD recording and the DVR window
- The same segments served live can be stored and assembled into a VOD asset after the broadcast — a bridge to the Video Hosting case.
- The DVR window is how much past broadcast is available to rewind; it is set by playlist length and segment retention.
- The viewer chases the live edge; going back into DVR grows the buffer but loses real time.
Content protection
- Signed URLs and tokens scope access to the manifest and segments by time and user.
- DRM (FairPlay for HLS, Widevine/PlayReady for DASH) encrypts segments and manages keys.
- Protection against stream re-capture (geo-blocking, key rotation) is its own product surface.
Chat and reactions as a separate real-time channel
Chat, reactions, and the viewer count are not part of the video pipeline. They are a separate real-time channel with their own fan-out, usually over WebSocket and a message bus, with their own latency and delivery semantics. On a large broadcast their scale (messages per second) can exceed the video load in event count. That fan-out, acknowledgement, and retry flow is closer to the Distributed Message Queue study than to video packaging.
Trade-offs and common mistakes
- Chasing sub-second everywhere: defaulting to WebRTC where LL-HLS would do sharply raises scale and cost.
- Ignoring segment alignment: misaligned keyframes break quality switching under ABR.
- Treating live like VOD: assuming offline transcoding and forgetting the per-segment deadline.
- Underestimating the peak: not warming the edge and not protecting the manifest from lockstep reloads at event start.
- Mixing the channels: routing chat through the same path as video instead of a separate real-time channel.
What to make explicit in interviews
- What the target latency budget is and why (sports vs regular broadcast vs interactive).
- Where the boundary sits between the video pipeline and the separate real-time chat/reactions channel.
- How delivery works at peak: CDN tiers, edge, manifest protection against the thundering herd.
- What happens after the broadcast: VOD recording, the DVR window, token and DRM protection.
References
Related chapters
- Video Hosting - Neighboring case on upload and VOD: offline transcoding, source storage, and on-demand delivery of finished files.
- CDN - Multi-tier caching, edge, and origin shield. Without that layer, fanning segments out to millions hits the origin on the very first spike.
- Real-Time Gaming - Another real-time class with a hard latency budget, where the cost of sub-second delivery and feedback is visible.
- Distributed Message Queue - Chat and reactions are not part of the video pipeline but a separate real-time channel: fan-out, acknowledgements, retries. This shows what delivering them costs.
