Content Delivery Network (CDN) — System Design Space

A content delivery network is not just a cache closer to the user. It is a system about traffic geography, cache hierarchy, content freshness, and origin protection under global load.

The chapter connects DNS routing, points of presence, an intermediate shielding layer, and cache lifetime policy into one design where response speed constantly competes with freshness.

For interviews and engineering discussions, this case quickly shows whether you can think beyond a single region, price the cost of a cache miss, and protect the origin under heavy traffic.

Traffic geography

You need to control where a user request actually lands through routing, nearest-PoP selection, and safe fallback paths when a region degrades.

Cache hierarchy

The edge cache, shielding layer, and origin store should behave like one chain rather than a set of unrelated nodes.

Content freshness

Decide in advance where TTL is enough, where purge is necessary, and where versioned URLs are the only safe option.

Origin protection

During cache misses and burst traffic, you need to cap fan-in to the origin, coalesce identical requests, and define a clear degraded mode.

Content delivery network (CDN) is a geographically distributed layer of servers that caches and serves content from the nearest point of presence (PoP). When the audience is spread across continents, every trip to the origin adds noticeable delay just in travel. A CDN moves the answer closer to the user: it reduces latency and removes a large share of traffic from origin servers.

Source

Acing the System Design Interview

A detailed breakdown of CDN architecture, cache invalidation, and the main design trade-offs.

Читать обзор

What a CDN solves

Lower latency: content is served from the nearest edge location
Less origin pressure: a large share of requests ends before it reaches the origin layer
Scalability: traffic can be spread across regions and points of presence
Fault tolerance: traffic can be rerouted when one location fails
Attack absorption: distributed infrastructure handles spikes and hostile traffic better than a single origin

Functional requirements

Core capabilities

Static content caching
Geographic traffic routing
Cache invalidation and refresh
Origin failover

Extended capabilities

Dynamic content acceleration
Edge-side computation
Secure connection termination
Request and response transformation

Non-functional requirements

Requirement	Target value	Rationale
Latency	< 50 ms (p99)	The user should not wait for the page to begin loading
Cache hit ratio	> 95%	Keep origin pressure as low as possible
Availability	99.99%	The CDN sits on the critical external path
Throughput	Tbps+	The system must handle global traffic and burst load

CDN architecture

System components

1. DNS-based routing

The entry point is DNS. In a global setup it is often combined with GeoDNS and Anycast so users land on the nearest point of presence with the lowest practical latency.

2. Edge nodes at each PoP

These nodes accept user traffic, serve content from local cache, and only forward requests deeper into the stack when necessary.

3. Intermediate origin shield

This layer aggregates cache misses from many PoPs and prevents the origin from receiving a flood of identical requests.

4. Origin server

The server or object store that is only accessed when content is not found in the intermediate caches.

CDN request path

User

DNS

Edge (PoP)

Miss

Shield

User

DNS

Edge (PoP)

Miss

Shield

Ready to run

Press a button to demo the flow

10-50ms

Edge cache hit

50-150ms

Shield cache hit

200-500ms+

Origin fetch

Preloaded vs on-demand caching

The choice comes down to one question: do we know in advance what will be asked for? A content set that is known and critical for the first load can be pushed out to the edge ahead of time. A library that changes constantly and is mostly user-generated content is pointless to distribute whole — there, on-demand lazy caching is usually the better fit. Its cost is plain: the first user pays the cold-start penalty.

Preloaded content

Content is distributed to edge nodes before the first user request ever arrives.

Advantages:

No first-request penalty
Predictable performance
Strong control over distribution timing

Limitations:

Requires explicit rollout control
Rare content still consumes edge capacity
Cross-region synchronization is harder

Best for: static sites, software distribution, critical frontend assets

On-demand caching

Content is cached only after the first real request reaches the edge node.

Advantages:

Adapts automatically to real demand
Uses storage more efficiently
Is simpler to operate

Limitations:

The first request pays for the cache miss
Misses increase origin pressure
Latency is less predictable

Best for: dynamic sites and large libraries of user-generated content

Cache invalidation

Freshness on edge nodes rides on three levers, and dropping any one of them is expensive. Too long a TTL and the user sees stale data; no clear purge path and an urgent fix cannot ship; no rule for what refreshes in the background and what does not, and every expiry turns into a miss all the way to the origin.

Cache Invalidation Strategies

Edge Cache

TTL: 01:00

Cached

TTL expiry

Content expires automatically after a configured Time-To-Live (TTL).

LowDelayed

Advantages

•Simple setup via HTTP headers
•No provider API integration required
•Predictable cache behavior

Drawbacks

•Updates wait until the TTL expires
•Picking the right value is tricky
•No instant invalidation

Use case: Static content with infrequent updates

Caching strategy

What is worth caching?

Content type	Cacheability	Recommended TTL
Static files (JS, CSS)	High	1 year with versioning
Images	High	1 month to 1 year
HTML pages	Medium	5 minutes to 1 hour
Public API responses	Medium	1 minute to 1 hour
Personalized content	Low	Usually do not cache

Cache key design

The cache key decides which content variants are treated as different objects. Mistakes here lead either to cache pollution or a weak hit ratio.

# Simple key (URL only):
cache_key = hash(url)

# Extended key:
cache_key = hash(url + headers["Accept-Encoding"] +
                 headers["Accept-Language"] +
                 query_params["version"])

# Vary tells the CDN which fields belong in the key:
Vary: Accept-Encoding, Accept-Language

Security and origin protection

The external CDN path is the first thing both attack traffic and to-be-decrypted traffic hit — so it has to be designed from both edges at once. That gives four pillars: DDoS defense, TLS termination, strict HTTPS policy, and certificate-status handling.

Attack absorption

Rate limiting on edge nodes
Anycast to spread burst traffic
Traffic scrubbing centers
Bot and anomaly detection

Connection security

Terminate secure sessions at the edge
Shared and dedicated certificates
Encrypted origin connections
Strict HTTPS policy and certificate-status stapling

Access control

Signed URLs and cookies
Access tokens
IP allow-lists
Geographic restrictions

Origin protection

Intermediate origin shield
Request coalescing
Hidden origin hostname
Firewall rules limited to CDN IP ranges

Metrics and observability

Three metrics tell you whether the network is actually working for the user: TTFB carries the sense of speed, the share of responses served from cache carries the offload, and the volume of traffic still reaching the origin layer shows how much the protection did not hold back.

Cache hit ratio

The share of requests completed without touching the origin

TTFB

How quickly the user receives the first byte of the response

Traffic volume

Total bytes served and regional traffic spikes

Key alerts:

Cache hit ratio < 90% → review TTLs and cache key structure
Origin errors > 1% → inspect the origin or the shielding layer
TTFB p99 > 100 ms → review routing and the path to origin
Traffic spike → possible attack or sudden content popularity

Interview questions

How do you keep cache invalidation consistent?

Use versioned URLs for immutable content, a purge API for urgent updates, and stale-while-revalidate where a short window of stale data is acceptable.

How do you protect the origin from a flood of identical misses?

Combine request coalescing, an origin shield, a circuit breaker, and selective cache pre-warming for the hottest objects.

When do you choose preloading over on-demand caching?

Preloading fits a limited set of critical assets. On-demand caching works better for large libraries and long-tail content that is not worth pushing everywhere in advance.

How do you handle dynamic content?

Use ESI, fragment caching, short TTLs with stale-while-revalidate, or edge computing to assemble personalized responses closer to the user.

Key takeaways

1.At global scale a CDN does two jobs at once: it keeps latency close to the user and keeps the origin layer from drowning in traffic.
2.Cache invalidation stays hard, so TTLs, versioning, and purge flows should be designed together rather than separately.
3.An origin shield reduces fan-in to the origin and helps the system survive mass cache misses.
4.The choice between preloading and on-demand caching depends on content shape, freshness requirements, and the cost of a miss.
5.If you watch only two metrics, watch cache hit ratio and TTFB — they tell you the most about how the CDN changes the user path.

References

Cloudflare — CDN Reference Architecture (Cloudflare Docs)Fastly — Caching and data storage at the edge (Fastly Documentation)R. Fielding et al. — HTTP Caching (RFC 9111, IETF, 2022)MDN Web Docs — HTTP Caching (Mozilla)

Related chapters

Acing the System Design Interview (short summary) - provides a clean frame for the case: requirements, scale, architecture, and the main trade-offs.
Object Storage (S3) - covers the origin layer: object durability, lifecycle rules, and what happens during cache misses.
Designing Data-Intensive Applications, 2nd Edition (short summary) - strengthens the foundation behind replication, consistency, and distributed trade-offs in a global delivery network.
Caching strategies: Cache-Aside, Read-Through, Write-Through, Write-Back - extends the discussion of TTL selection, invalidation, and cache hit ratio management on edge nodes.
Video Feed (YouTube/TikTok) - shows a heavy media workload where geo-distributed delivery and origin protection become critical.
Domain Name System (DNS) - explains DNS-based routing and nearest-PoP selection through GeoDNS and Anycast.
System design case studies examples - places the CDN case in broader interview context and makes it easier to compare with other architecture problems.
Uber/Lyft - adds a global system with hard latency requirements, where regional traffic placement is critical.
URL Shortener (TinyURL) - covers a related redirect-heavy workload where serving responses closer to the user reduces origin pressure.