At the C++ Russia 2022 conference we conducted a public System Design interview with the task of designing video hosting feed - system, similar to YouTube. This case demonstrates working with asynchronous processing, video transcoding and building personalized feeds.
Video recording of the interview
The full recording of the public interview is available on YouTube. I recommend watching it to see the iterative design process.
Watch on YouTubeStatement of the problem
Video Hosting Feed
Design an application that allows content creators to upload videos, and viewers can view it in a chronological feed with the ability to select quality.
Functional
Fast video upload experience for content creators.
Publish videos into subscriber feeds after processing is completed.
Playback quality selection from 360p to 1080p.
Chronological feed with simple ordering by publication time.
Non-functional
Availability: high availability
Viewing and upload flows should stay available under partial failures.
Scalability: horizontal growth
The system should scale with growing audience and content volume.
Fault tolerance: failure resilience
Single-node failures should not cause video loss or serving downtime.
Cost efficiency: infrastructure cost control
Architecture should use storage, CDN, and compute resources efficiently.
Load Estimation (Back of the Envelope)
Key Takeaway: Main load - it's video storage and delivery, not the number of requests. Effective use of CDN and storage optimization are critical.
High-level architecture
The system is divided into two main paths: Write Path (video downloading and processing) and Read Path (view and feed).
Video Hosting: High-Level Map
ingestion + transcoding + feed + video deliveryIngestion + Processing Plane
Storage + Serving Plane
The upload/transcoding loop scales independently of read-serving. This allows you to separately expand processing workers and edge CDN capacity.
Read/Write Flow
Read/Write Path Explorer
Switch path and replay processing steps in Play mode.
Write Path: operational notes
- The client uploads the original to the upload circuit, after which the API returns `video_id` and processing status.
- The video is broken down into transcoding tasks; workers generate several quality profiles.
- Ready renditions and manifests are published in origin storage, and metadata is recorded in the DB.
- After the `ready` status, fan-out is launched in feed storage and notifications to subscribers.
Read Path: operational notes
- Viewer requests feed, the service returns a list of `video_id` by subscriptions and personal signals.
- Metadata API provides playback manifest (HLS/DASH) and available permissions.
- CDN serves segments from edge nodes; hot content is almost always read from cache.
- With cache miss, edge requests a segment in origin storage and immediately warms up the local cache.
Key Components
Upload API
Accepts binary files from creators and saves them to temporary storage.
POST /v1/videos → {video_id, upload_url}Message Queue (Task Broker)
RabbitMQ or equivalent for managing transcoding tasks. A parent task generates 4 child tasks, one for each resolution.
Video Workers
Stateless workers for transcoding. Read tasks from the queue, process video (FFmpeg), save the result in Blob Storage. Easily scaled horizontally.
Blob Storage
S3-compatible storage (MinIO, Ceph) for video files. Divided into temporary (for downloading) and permanent (processed videos).
Feed Database
Document-oriented NoSQL database (MongoDB, Cassandra) for storing pre-computed tapes. Shard by user_id. Feed is an array of video IDs, sorted by time.
CDN
Critical for video delivery. Caches popular content on edge servers closer to users. Reduces the load on Blob Storage and reduces latency.
Data Models
Video Meta
{
"id": "uuid",
"author_id": "user_123",
"title": "My Video",
"description": "...",
"created_at": "2024-03-15T10:00:00Z",
"status": "ready",
"versions": {
"360p": "s3://videos/uuid/360p.mp4",
"480p": "s3://videos/uuid/480p.mp4",
"720p": "s3://videos/uuid/720p.mp4",
"1080p": "s3://videos/uuid/1080p.mp4"
}
}User Feed
{
"user_id": "user_456",
"feed": [
"video_id_1",
"video_id_2",
"video_id_3",
...
],
"last_updated": "2024-03-15T12:00:00Z"
}Pre-calculated feed - an array of video IDs from subscriptions, sorted by time.
Strategies for building a feed (Fan-out)
→Fan-out on Write (Push)
When we publish a video, we add it to all subscribers’ feeds.
←Fan-out on Read (Pull)
When requesting a feed, we collect the latest videos from all subscriptions.
Hybrid approach
For ordinary authors we use Push (fan-out on write). For authors with millions of subscribers - Pull when reading. This is the classic compromise that Twitter/X makes.
Infrastructure services
Load Balancer L4/L7
Traffic distribution between API servers
Service Discovery
Service registration and discovery (Consul, etcd)
Auto-scaling
Automatic scaling of workers based on load
Monitoring
Metrics, alerts, logging (Prometheus, Grafana)
Rate Limiting
Protection against abuse and DDoS
Circuit Breaker
Graceful degradation during failures
Key takeaways from the interviews
Separate Write and Read paths
Asynchronous video processing (Write) and synchronous tape reading (Read) have different requirements and scale differently.
Use queues for heavy operations
Transcoding is a resource-intensive operation. Message Queue allows manage the load and easily scale workers.
CDN is a must have for media content
With 300 TB of new content per day, the system will not survive without a CDN. Caching on edge reduces load and latency.
Anticipate where possible.
Pre-calculated feeds are significantly faster than on-the-fly assembly from multiple sources on every request.
