Video hosting does not break on the player screen. It breaks where the system must absorb huge upload volume, survive heavy processing, and then deliver the same asset to millions of viewers at far lower cost.
The chapter ties together video intake, the transcoding queue, storage for source and processed files, feed publication, and CDN delivery into one working system.
For interviews and engineering discussions, this case is useful because it forces a clean explanation of the heavy write path, the hot read path, feed fan-out strategy, and the cost of media traffic.
Heavy Write Path
Upload intake, validation, and video processing should not slow publication and should not interfere with the viewing path, which follows a very different load profile.
Transcoding Queue
Asynchronous processing is what spreads CPU pressure over time, absorbs upload spikes, and lets workers scale independently.
Hot Read Path
Feed lookup, metadata delivery, and segment streaming must stay fast even when a newly published video turns into a traffic spike.
Media Delivery Cost
The bill is driven not only by disks, but also by network egress, cache warmup, the number of quality variants, and repeated reads from origin.
This public interview from C++ Russia 2022 is a good example of why video hosting is harder than just embedding a player. The system has to accept user uploads, run transcoding, update feeds quickly, and then deliver the same content to millions of viewers through a CDN.
Interview recording
The full public interview is available on YouTube. It is useful because it shows not only the final architecture, but also the reasoning that leads to it.
Watch on YouTubeProblem framing
Video-hosting feed
Design a service where creators upload videos and viewers watch them in a chronological feed with selectable playback quality.
At this scale, the key non-functional goals are availability, throughput, failure tolerance, and cost control. Otherwise the system will stall either on publication speed or on media delivery cost.
Functional
Fast upload experience for content creators.
Publish the video into subscriber feeds after processing is complete.
Let viewers choose playback quality from 360p to 1080p.
Provide a chronological feed with simple ordering by publication time.
Non-functional
Availability: high
Upload and playback should stay available under partial failures.
Scalability: horizontal
The system should grow with audience size and content volume.
Fault tolerance: mandatory
Losing a node must not stop serving traffic or destroy video data.
Cost efficiency: controlled
Compute, storage, and media delivery costs must stay predictable.
Load estimation (back of the envelope)
Key takeaway: the dominant pressure comes not from raw request count, but from storage and media delivery volume. The real cost drivers are quality variants, caching strategy, and network egress.
High-level architecture
The architecture separates the write path, which accepts and processes uploads, from the read path, which assembles feeds, serves metadata, and streams the final video.
Video Hosting: High-Level Map
uploads, processing, feed generation, and playback deliveryIngestion and Processing Plane
Storage and Serving Plane
Upload and processing can scale independently from the viewing path. That lets the system add workers, cache capacity, and delivery throughput separately.
Write Path and Read Path
Write/Read Path Explorer
Switch the scenario and replay each step in sequence.
Write path: key steps
- The creator uploads the source file and receives a `video_id` plus processing status.
- The system splits the job into separate transcoding tasks and produces 360p, 480p, 720p, and 1080p variants.
- Finished files and segments land in origin storage, while metadata tracks processing state and publication readiness.
- Once the video reaches `ready`, the system updates follower feeds and emits a publication event.
Read path: key steps
- A viewer opens the feed, and the service returns `video_id` entries based on subscriptions and simple ranking signals.
- The metadata service returns the HLS/DASH playback manifest and the available quality variants.
- CDN usually serves segments from an edge cache close to the user.
- On a cache miss, CDN fetches the segment from origin storage and warms the local edge cache.
Key components
The critical pieces are the task queue for heavy work, blob storage for intermediate and final files, the origin layer for segment delivery, and the playback manifest in metadata that tells the player how to start streaming.
Upload API
Accepts files from creators, validates limits, and places the source video into temporary storage.
POST /v1/videos → {video_id, upload_url}Transcoding task queue
RabbitMQ or a similar broker spreads heavy processing across workers. One upload usually expands into several jobs, one per quality profile.
Video processing workers
These stateless workers read jobs from the queue, run FFmpeg, generate quality variants, and scale horizontally when upload pressure rises.
Blob storage
An S3-compatible store such as MinIO or Ceph keeps source videos, finished files, and stream segments. In practice, teams often split it into temporary and permanent layers.
Feed store
A document-oriented NoSQL database such as MongoDB or Cassandra stores precomputed feeds. Data is usually sharded by user_id, while each feed is a time-ordered array of video IDs.
CDN
CDN keeps popular segments close to viewers and removes pressure from the origin layer. It is essential for low playback startup time and for protecting the storage layer during traffic spikes.
Data models
Video metadata
{
"id": "uuid",
"author_id": "user_123",
"title": "My Video",
"description": "...",
"created_at": "2024-03-15T10:00:00Z",
"status": "ready",
"versions": {
"360p": "s3://videos/uuid/360p.mp4",
"480p": "s3://videos/uuid/480p.mp4",
"720p": "s3://videos/uuid/720p.mp4",
"1080p": "s3://videos/uuid/1080p.mp4"
}
}User feed
{
"user_id": "user_456",
"feed": [
"video_id_1",
"video_id_2",
"video_id_3",
...
],
"last_updated": "2024-03-15T12:00:00Z"
}A precomputed feed stores only video IDs and the latest refresh timestamp, which keeps the read path cheap and predictable.
Feed delivery strategies
The main product trade-off is where to perform feed fan-out: at publish time or at read time.
→Fan-out on write (push)
When a video is published, the system immediately writes it into subscriber feeds.
←Fan-out on read (pull)
The feed is assembled on demand from recent uploads across subscriptions.
Hybrid approach
Push usually works better for ordinary creators because it makes reads cheap. For celebrity-scale accounts, pull or a mixed strategy is often safer so the publish path does not explode under millions of feed updates.
Infrastructure services
L4/L7 Load Balancer
Distributes traffic across public APIs and serving services.
Service Discovery
Helps services find healthy peers through systems like Consul or etcd.
Autoscaling
Adjusts the number of workers based on actual processing pressure.
Monitoring
Collects metrics, alerts, and logs through tools such as Prometheus and Grafana.
Rate Limiting
Protects the platform against abuse and volumetric attacks.
Circuit Breaker
Turns instability into graceful degradation instead of a cascading failure.
Key takeaways from the interview
Separate writes from reads
Upload and processing grow under one set of constraints, while feed serving and media delivery grow under another. Keep those paths independent.
Push heavy work into queues
Transcoding should never sit on the user request path. Asynchronous processing keeps load manageable and makes worker scaling predictable.
CDN is mandatory for media delivery
With hundreds of terabytes of new content per day, a global cache is the only way to keep delivery cost and origin pressure under control.
Precompute what users read most often
A prepared feed is usually cheaper and faster than rebuilding it from multiple sources every time the app opens.
Related chapters
- Content Delivery Network (CDN) - helps explain how global caching and distributed delivery keep playback startup fast under heavy traffic.
- System Design Interview: An Insider's Guide (short summary) - provides a classic structure for media-system answers: requirements, sizing, write path, read path, and trade-offs.
- Hacking the System Design Interview (short summary) - deepens the trade-off discussion around feed fan-out, transcoding queues, storage tiers, and resilience.
- System design case studies examples - places the video-hosting problem in the broader case-study landscape and makes architecture comparisons easier.
- Short-Term Preparation for System Design Interviews - shows how to package the solution for an interview, from requirements and sizing to architecture, risks, and system evolution.
