System Design Space
Knowledge graphSettings

Updated: March 24, 2026 at 11:23 AM

Domain Name System (DNS)

medium

DNS server hierarchy, zones and delegation, caching and resolution process.

This chapter matters because DNS in a real system is not a name directory, but a control layer for routing, failover, and change propagation through caches.

In practice, it helps you see the whole resolution chain: zones, delegation, TTL, stale records, and propagation delay that can make the problem live in the name and its caches rather than in the application.

In interviews and design discussions, it makes one of the most common hidden bottlenecks in distributed systems visible.

Practical value of this chapter

Resolution path

Helps analyze the full lookup chain and caching side effects on expected behavior.

Availability risks

Makes TTL, stale records, and propagation delay explicit in resilience planning.

Global behavior

Shows DNS impact on latency, geo-routing, and traffic balancing strategies.

Interview scenarios

Improves case discussions where DNS becomes the hidden reliability bottleneck.

RFC

RFC 1035 (DNS)

Classic DNS spec: message format, record types, and recursive resolution behavior.

Перейти на сайт

DNS is a critical Internet control plane. It maps names to addresses, delegates zone authority, and through TTL and cache behavior directly affects latency, availability, and infrastructure cost.

Core DNS properties

Hierarchical model

DNS splits responsibility along the tree root -> TLD -> authoritative zones.

Recursive resolution

Recursive resolver traverses DNS hierarchy and returns a final answer to the caller.

TTL and caching

Caching reduces latency and authoritative pressure, but makes change rollout strategy more complex.

Multiple record types

A/AAAA, CNAME, NS, MX, TXT and other records define service addressing behavior.

Critical control plane

DNS impacts availability of almost every external call, so resolution incidents escalate quickly.

DNS message content visualization

Header fields and section composition define cache behavior, authoritative pressure, and transport overhead.

DNS Message Header

12 bytes + sections

ID

16 bits

Flags

16 bits

QDCOUNT

16 bits

ANCOUNT

16 bits

NSCOUNT

16 bits

ARCOUNT

16 bits

Question section (variable)

32 bits

Answer/Authority/Additional (variable)

32 bits

DNS header is fixed at 12 bytes, followed by variable question and answer sections. Response shape affects transport behavior and resolution latency.

DNS query lifecycle

Client query

Stub resolver sends query to recursive resolver (local DNS or a public resolver).

Recursive hierarchy traversal

On cache miss resolver walks root -> TLD -> authoritative and follows referrals.

Response and cache

Answer is returned to client and cached by TTL to speed up next lookups.

Related chapter

OSI model

DNS belongs to application layer (Layer 7) and should be analyzed with layered diagnostics.

Open chapter

Hierarchy of DNS servers

The DNS namespace is a tree: root → TLD → domain. Each zone is served by authoritative servers, and a recursive resolver caches responses.

DNS server hierarchy

Select a level to highlight its role in the system

Recursive Resolver

Caching and recursive queries

Root Name Servers

Delegation to TLD

TLD Name Servers

.com, .org, .ru, etc.

Authoritative Servers

Domain zone records

Root and TLD servers delegate responsibility down the hierarchy.
Authoritative servers are responsible for a specific domain zone.

Interactive resolve process

A recursive resolver goes through the hierarchy of DNS servers, receives references to authoritative servers and returns a response to the client, storing it in the cache.

Domain name resolution

Click a step or use the controls to play through the sequence

Active Step

Click "Start" to run the domain name resolution flow.

Client
Recursive Resolver
Root
TLD
Authoritative

Cache

Responses are cached by the resolver and client to reduce latency of subsequent queries.

DNS cache and latency dynamics under load

Step through how TTL, cache hit ratio, and authoritative pressure impact resolution time.

StepInterval 1 (1 of 6)
Cache hit (%)Authoritative load (%)Lookup latency (ms)

Phase

Warm cache

Cache hit ratio

93.0%

Average lookup

12 ms

Authoritative load

0.8k QPS

NXDOMAIN

0.3%

TTL policy: Default TTL

What is happening: Most requests are served from recursive cache and authoritative servers stay lightly loaded.

Abbreviations

  • QPS (queries per second) — number of DNS queries served each second.
  • NXDOMAIN — the requested domain name does not exist.

Metric decoding

  • Share of requests served from recursive cache without full hierarchy traversal.
  • Average end-to-end DNS resolution time observed by client-side callers.

Related chapter

Load Balancing

DNS often acts as the first traffic steering layer before L4/L7 balancing.

Open chapter

How network and routing shape DNS behavior

Cache miss and latency

Each cache miss adds extra network hops to authoritative zone and increases user-visible delay.

TTL trade-off

Low TTL improves rollout speed but increases authoritative QPS and DNS infrastructure cost.

Anycast and geography

Global distribution of resolver/authoritative nodes reduces lookup latency and tail spikes.

Packet loss and UDP fallback

Loss and truncation can push part of DNS traffic to TCP, increasing response time and overhead.

DDoS and anomalous traffic

NXDOMAIN storms and amplification attacks overload DNS infrastructure without strict rate controls.

Where DNS matters most

  • Service discovery for clients and backend services
  • Traffic steering (geo/latency routing, weighted failover)
  • CDN routing and nearest edge selection
  • Domain ownership checks and email routing (MX, TXT, SPF/DKIM)
  • Resolver-level security and filtering policies

Why this matters in System Design

  • DNS lookup contributes latency before the first network call and affects p95/p99 user-facing metrics.
  • TTL policy controls rollout speed versus authoritative QPS and infrastructure cost.
  • DNS configuration mistakes often look like app-level incidents, so dedicated DNS observability is essential.
  • Robust DNS design reduces incident blast radius and improves multi-region resilience.

Common mistakes

Using very low TTL without estimating resulting authoritative load under traffic spikes.

Ignoring negative caching (NXDOMAIN) and causing unnecessary retry storms.

Mixing business incidents with DNS incidents without dedicated cache/lookup observability.

Skipping multi-provider or DR strategy for DNS and creating a single point of failure.

Related chapters

Enable tracking in Settings