System Design Space
Knowledge graphSettings

Updated: April 21, 2026 at 3:20 PM

Domain Name System (DNS)

medium

DNS hierarchy, zone delegation, caching, TTL, and the resolution path from client to authoritative server.

This chapter matters because in a real architecture DNS is not just a name directory. It is an early routing, caching, and delegation layer.

In practice, it helps you see the whole resolution chain: from local and recursive resolvers to TTL, stale records, and propagation delay that can keep the problem in caches rather than in the application.

In interviews and design discussions, it makes a hidden layer visible, one that almost every external call depends on and where resilience often breaks first.

Practical value of this chapter

Resolution path

Helps analyze the full lookup chain and understand when caching accelerates the system versus when it hides stale answers.

Availability risks

Makes TTL, stale records, and propagation delay explicit in resilience planning.

Global behavior

Shows how DNS decisions influence latency, geographic routing, and the path clients take to a service.

Interview scenarios

Improves case discussions where the hidden reliability bottleneck sits in name resolution rather than in application code.

RFC

RFC 1035 (DNS)

Foundational DNS specification: message format, record types, zone delegation, and resolver behavior.

Перейти на сайт

DNS acts as a critical Internet control plane. It maps names to addresses, routes queries through delegated zones, and through TTL and cache behavior directly affects latency, availability, and infrastructure cost.

That chain includes the local stub resolver, the recursive resolver, zone delegation, and often anycast for geographic distribution. As a result, DNS shapes not just correctness, but also cache behavior, failure handling, and how quickly operational changes become visible.

Core properties of DNS

Hierarchical delegation

DNS distributes responsibility down the tree: root servers point to TLDs, and TLDs point to authoritative servers for a specific zone.

Recursive resolution

A recursive resolver walks the delegation chain, gathers the needed references, and returns a final answer to the caller.

TTL and caching

Caching reduces latency and authoritative load, but makes change propagation slower and operationally trickier.

Multiple record types

A/AAAA, CNAME, NS, MX, TXT and other records define service addressing, delegation, and supporting domain policies.

Critical control plane

DNS affects availability of almost every external call, so resolution failures escalate into visible incidents very quickly.

How a DNS message header is structured

The fixed header determines how many questions, answers, and additional records are present in the message. That influences caching, response size, and transport behavior.

DNS Message Header

12 bytes + variable sections

ID

16 bits

Flags

16 bits

QDCOUNT

16 bits

ANCOUNT

16 bits

NSCOUNT

16 bits

ARCOUNT

16 bits

Question section (variable)

32 bits

Answer, authority, and additional sections (variable)

32 bits

DNS uses a fixed 12-byte header followed by variable question and answer sections. That structure affects cache behavior, response size, and the transport path.

DNS query lifecycle

Client query

A local stub resolver sends the query to a recursive resolver, usually provided by the ISP or a public DNS service.

Hierarchy traversal

On a cache miss, the recursive resolver walks from root to TLD and then to the authoritative server for the target zone.

Response and cache

The final answer is returned to the client and cached for the record TTL so the next lookup can skip the full chain.

Related chapter

OSI model

DNS lives at the application layer and benefits from layered troubleshooting.

Open chapter

DNS server hierarchy

The DNS namespace forms a tree: root, top-level domains, and then the zone for a specific domain. Authoritative servers answer for their zone, while recursive resolvers keep recent answers in cache.

DNS server hierarchy

Select a level to highlight its role in the lookup path

Recursive resolver

Accepts the client query, walks the hierarchy, and caches the answer

Root name servers

Points the resolver to the correct top-level domain

TLD name servers

Shows which server is authoritative for the target zone

Authoritative servers

Holds the final zone records and returns the answer

Root and TLD servers delegate responsibility further down the hierarchy.
Authoritative servers answer only for the specific zone they own.

How domain name resolution works

The recursive resolver starts with known root servers, follows the chain of referrals, and only then reaches the authoritative server for the target zone. A successful answer is cached for the record TTL.

Domain name resolution

Run the lookup step by step or play the full chain from the client to the authoritative server

Current step

Click "Start" to run the domain name resolution flow.

Client
Recursive resolver
Root servers
TLD servers
Authoritative server

Cache

Both the client and the recursive resolver keep recent answers in cache so they do not have to walk the tree on every query.

It is useful to look not only at the resolution path itself, but also at cache dynamics: record TTL, cache hit ratio, and the pressure that builds up on authoritative servers during traffic spikes or frequent changes.

DNS cache and latency dynamics under load

Step through how TTL, cache hit ratio, and authoritative pressure change resolution time.

StepInterval 1 (1 of 6)
Cache hit (%)Authoritative load (%)Lookup latency (ms)

Phase

Warm cache

Cache hit ratio

93.0%

Average lookup

12 ms

Authoritative load

0.8k QPS

NXDOMAIN

0.3%

TTL policy: Default TTL

What is happening: Most requests are served from recursive cache and authoritative servers stay lightly loaded.

Abbreviations

  • QPS (queries per second) — number of DNS queries served each second.
  • NXDOMAIN — the requested domain name does not exist.

Metric decoding

  • Share of requests served from recursive cache without full hierarchy traversal.
  • Average end-to-end DNS resolution time observed by the caller.

Related chapter

Load Balancing

DNS often acts as the first region or site selection layer before L4/L7 balancing.

Open chapter

How network and routing shape DNS behavior

Cache miss and extra traversal

Every cache miss adds more network hops toward the authoritative zone and increases user-visible delay.

TTL trade-off

Low TTL speeds up changes, but increases authoritative QPS and the cost of running DNS infrastructure.

Anycast and geography

Global distribution of resolver and authoritative nodes reduces lookup latency and makes tail behavior more predictable.

Packet loss and TCP fallback

Packet loss and response truncation can push part of DNS traffic to TCP, increasing response time and overhead.

DDoS and anomalous traffic

NXDOMAIN storms and amplification attacks overload DNS infrastructure without strict rate controls.

Where DNS matters most

  • Service discovery for client and internal services
  • Regional steering, weighted balancing, and failover decisions
  • CDN routing and nearest edge selection
  • Domain ownership checks and email routing (MX, TXT, SPF, DKIM)
  • Resolver-level security and filtering policies

Why this matters for system design

  • Name resolution adds latency before the first network call and therefore affects user-facing p95/p99 metrics.
  • TTL policy determines the balance between change propagation speed and authoritative infrastructure load.
  • DNS configuration mistakes often look like application incidents, so DNS needs dedicated observability.
  • Robust DNS design reduces incident blast radius and improves resilience in multi-region systems.

Common mistakes

Using very low TTL without estimating the resulting load on authoritative servers during traffic spikes.

Ignoring negative caching and causing unnecessary retry storms for missing names.

Mixing application incidents with DNS incidents without dedicated visibility into cache behavior and lookup time.

Keeping DNS with a single provider and leaving a hidden single point of failure in place.

Related chapters

  • OSI model - positions DNS as an application-layer protocol and improves layer-by-layer troubleshooting.
  • IPv4 and IPv6: evolution of IP addressing - shows how A/AAAA records and addressing behavior shape DNS publication and routing strategy.
  • UDP protocol - explains why DNS usually rides over UDP and why packet loss immediately affects lookup time.
  • TCP protocol - explains when DNS falls back to TCP, for example on truncation or zone transfer.
  • HTTP protocol - reminds you that every HTTP flow starts with name resolution and depends on DNS before it depends on the application.
  • Load Balancing - shows how DNS often becomes the first region or site selection layer before L4/L7 balancing.
  • Case study: CDN infrastructure - shows the practical side of global steering through DNS and edge infrastructure.
  • Why distributed systems and consistency matter - connects DNS decisions to resilience, traffic distribution, and distributed-system blast radius.

Enable tracking in Settings