System Design Space
Knowledge graphSettings

Updated: April 11, 2026 at 11:51 PM

Video hosting feed

medium

Public interview at C++ Russia 2022: video intake, transcoding, CDN delivery, and feed publication strategy for subscribers.

Video hosting does not break on the player screen. It breaks where the system must absorb huge upload volume, survive heavy processing, and then deliver the same asset to millions of viewers at far lower cost.

The chapter ties together video intake, the transcoding queue, storage for source and processed files, feed publication, and CDN delivery into one working system.

For interviews and engineering discussions, this case is useful because it forces a clean explanation of the heavy write path, the hot read path, feed fan-out strategy, and the cost of media traffic.

Heavy Write Path

Upload intake, validation, and video processing should not slow publication and should not interfere with the viewing path, which follows a very different load profile.

Transcoding Queue

Asynchronous processing is what spreads CPU pressure over time, absorbs upload spikes, and lets workers scale independently.

Hot Read Path

Feed lookup, metadata delivery, and segment streaming must stay fast even when a newly published video turns into a traffic spike.

Media Delivery Cost

The bill is driven not only by disks, but also by network egress, cache warmup, the number of quality variants, and repeated reads from origin.

This public interview from C++ Russia 2022 is a good example of why video hosting is harder than just embedding a player. The system has to accept user uploads, run transcoding, update feeds quickly, and then deliver the same content to millions of viewers through a CDN.

Interview recording

The full public interview is available on YouTube. It is useful because it shows not only the final architecture, but also the reasoning that leads to it.

Watch on YouTube

Problem framing

Video-hosting feed

Design a service where creators upload videos and viewers watch them in a chronological feed with selectable playback quality.

At this scale, the key non-functional goals are availability, throughput, failure tolerance, and cost control. Otherwise the system will stall either on publication speed or on media delivery cost.

Functional

Fast upload experience for content creators.

Publish the video into subscriber feeds after processing is complete.

Let viewers choose playback quality from 360p to 1080p.

Provide a chronological feed with simple ordering by publication time.

Non-functional

Availability: high

Upload and playback should stay available under partial failures.

Scalability: horizontal

The system should grow with audience size and content volume.

Fault tolerance: mandatory

Losing a node must not stop serving traffic or destroy video data.

Cost efficiency: controlled

Compute, storage, and media delivery costs must stay predictable.

Load estimation (back of the envelope)

DAU (Daily Active Users)10 million
Views per day100 million
Video uploads per day1 million (~12 RPS)
Feed requests per day50 million (~580 RPS)
Storage growth per day~300 TB

Key takeaway: the dominant pressure comes not from raw request count, but from storage and media delivery volume. The real cost drivers are quality variants, caching strategy, and network egress.

High-level architecture

The architecture separates the write path, which accepts and processes uploads, from the read path, which assembles feeds, serves metadata, and streams the final video.

Video Hosting: High-Level Map

uploads, processing, feed generation, and playback delivery

Ingestion and Processing Plane

Creator
video upload
API Gateway
auth and routing
Upload API
source file intake
Processing Queue
transcoding jobs
Video Workers
HLS/DASH variants

Storage and Serving Plane

Origin Storage
files and segments
Metadata Store
video state and manifest
Feed Store
personalized feeds
Viewer -> Feed API -> CDN
main playback path
The map shows how uploads, processing, metadata, and playback delivery are separated into distinct planes.

Upload and processing can scale independently from the viewing path. That lets the system add workers, cache capacity, and delivery throughput separately.

Write Path and Read Path

Write/Read Path Explorer

Switch the scenario and replay each step in sequence.

1
Creator
uploads source video
2
Upload API
validation and temporary storage
3
Queue and Workers
transcoding
4
Origin and Metadata
files and playback manifest
5
Feed Update
fan-out and publish event
Write path: press Play to walk from the upload request to feed publication.

Write path: key steps

  • The creator uploads the source file and receives a `video_id` plus processing status.
  • The system splits the job into separate transcoding tasks and produces 360p, 480p, 720p, and 1080p variants.
  • Finished files and segments land in origin storage, while metadata tracks processing state and publication readiness.
  • Once the video reaches `ready`, the system updates follower feeds and emits a publication event.

Read path: key steps

  • A viewer opens the feed, and the service returns `video_id` entries based on subscriptions and simple ranking signals.
  • The metadata service returns the HLS/DASH playback manifest and the available quality variants.
  • CDN usually serves segments from an edge cache close to the user.
  • On a cache miss, CDN fetches the segment from origin storage and warms the local edge cache.

Key components

The critical pieces are the task queue for heavy work, blob storage for intermediate and final files, the origin layer for segment delivery, and the playback manifest in metadata that tells the player how to start streaming.

Upload API

Accepts files from creators, validates limits, and places the source video into temporary storage.

POST /v1/videos → {video_id, upload_url}

Transcoding task queue

RabbitMQ or a similar broker spreads heavy processing across workers. One upload usually expands into several jobs, one per quality profile.

360p
480p
720p
1080p

Video processing workers

These stateless workers read jobs from the queue, run FFmpeg, generate quality variants, and scale horizontally when upload pressure rises.

Blob storage

An S3-compatible store such as MinIO or Ceph keeps source videos, finished files, and stream segments. In practice, teams often split it into temporary and permanent layers.

Temporary storagePermanent storage

Feed store

A document-oriented NoSQL database such as MongoDB or Cassandra stores precomputed feeds. Data is usually sharded by user_id, while each feed is a time-ordered array of video IDs.

CDN

CDN keeps popular segments close to viewers and removes pressure from the origin layer. It is essential for low playback startup time and for protecting the storage layer during traffic spikes.

Data models

Video metadata

{
  "id": "uuid",
  "author_id": "user_123",
  "title": "My Video",
  "description": "...",
  "created_at": "2024-03-15T10:00:00Z",
  "status": "ready",
  "versions": {
    "360p": "s3://videos/uuid/360p.mp4",
    "480p": "s3://videos/uuid/480p.mp4",
    "720p": "s3://videos/uuid/720p.mp4",
    "1080p": "s3://videos/uuid/1080p.mp4"
  }
}

User feed

{
  "user_id": "user_456",
  "feed": [
    "video_id_1",
    "video_id_2",
    "video_id_3",
    ...
  ],
  "last_updated": "2024-03-15T12:00:00Z"
}

A precomputed feed stores only video IDs and the latest refresh timestamp, which keeps the read path cheap and predictable.

Feed delivery strategies

The main product trade-off is where to perform feed fan-out: at publish time or at read time.

Fan-out on write (push)

When a video is published, the system immediately writes it into subscriber feeds.

Fast feed reads
Expensive publish path for creators with huge audiences

Fan-out on read (pull)

The feed is assembled on demand from recent uploads across subscriptions.

Simple publish flow
More expensive and slower reads

Hybrid approach

Push usually works better for ordinary creators because it makes reads cheap. For celebrity-scale accounts, pull or a mixed strategy is often safer so the publish path does not explode under millions of feed updates.

Infrastructure services

L4/L7 Load Balancer

Distributes traffic across public APIs and serving services.

Service Discovery

Helps services find healthy peers through systems like Consul or etcd.

Autoscaling

Adjusts the number of workers based on actual processing pressure.

Monitoring

Collects metrics, alerts, and logs through tools such as Prometheus and Grafana.

Rate Limiting

Protects the platform against abuse and volumetric attacks.

Circuit Breaker

Turns instability into graceful degradation instead of a cascading failure.

Key takeaways from the interview

1

Separate writes from reads

Upload and processing grow under one set of constraints, while feed serving and media delivery grow under another. Keep those paths independent.

2

Push heavy work into queues

Transcoding should never sit on the user request path. Asynchronous processing keeps load manageable and makes worker scaling predictable.

3

CDN is mandatory for media delivery

With hundreds of terabytes of new content per day, a global cache is the only way to keep delivery cost and origin pressure under control.

4

Precompute what users read most often

A prepared feed is usually cheaper and faster than rebuilding it from multiple sources every time the app opens.

Related chapters

Enable tracking in Settings