HTTP protocol — System Design Space

This chapter is useful because it presents HTTP evolution as a sequence of trade-offs around simplicity, caching, connection cost, and behavior under load.

In real engineering work, it helps you choose between HTTP/1.1, HTTP/2, and HTTP/3 based on traffic shape, reason about idempotency and caching semantics, and avoid confusing API design with transport behavior.

In interviews and design discussions, it gives you a structured language for discussing web-system performance and protocol-level trade-offs rather than only talking about endpoints.

Practical value of this chapter

Protocol to product

Connects HTTP behavior to UX metrics: latency, retries, caching, and API stability.

Version-aware design

Supports HTTP/1.1 vs HTTP/2 vs HTTP/3 decisions by traffic and network profile.

Performance tactics

Applies keep-alive, compression, caching, and multiplexing deliberately instead of by habit.

Interview articulation

Provides clear structure for discussing protocol-level web optimization trade-offs.

RFC

RFC 9110 (HTTP Semantics)

The current HTTP semantics specification: methods, status codes, headers, and cache behavior — the document that defines what the protocol is honestly allowed to do.

Перейти на сайт

HTTP is more than the language of the web — it is the contract that defines how clients ask for data and the form in which services answer. Version choice, caching rules, and retry policy decide what latency you end up with, whether the system holds at peak, and how much each request actually costs.

In system design, HTTP is the default request-response contract between clients and services. It is stateless by design, so resilience has to be assembled outside the protocol — from caching, token-based identity, database state, and application logic.

The behavior of a concrete request is shaped by persistent connections, keep-alive, cache control, entity tags, and the chosen timeout and retry policy. Those choices are what produce the final latency, throughput, and real processing cost of the path.

As traffic grows, multiplexing, cache-hit ratio, and head-of-line blocking move to center stage. In parallel, the request path usually runs through load balancing, a CDN, or an API gateway — so the choice between HTTP/1.1, HTTP/2, and HTTP/3 turns into a choice tailored to specific network conditions and a specific traffic shape.

Core properties of HTTP

Client-server interaction

The client sends an HTTP request and the server returns a status code, headers, and optionally a body — an asymmetric exchange where initiative always sits on the client side.

Stateless processing

The protocol itself remembers nothing between requests, so state has to be assembled outside it — in caches, databases, tokens, and sessions. That is convenient for horizontal scaling and painful when forgotten.

Header-driven behavior

Caching, content type, authorization, and security policy all travel in headers. Much of the subtle behavior of HTTP lives there, not in the body.

Intermediary layers

Between client and service you usually find proxies, CDNs, and API gateways. They absorb load and latency but add their own layer of rules, caches, and failure modes.

Version evolution

HTTP/1.1, HTTP/2, and HTTP/3 share the same application semantics, but under load they behave very differently — from connection profile to resilience under packet loss.

How an HTTP message is structured

Whichever version is on the wire, an HTTP message keeps the same shape: a request or status line, headers, a blank line, and a body when one is needed. That shape is where client and service negotiate.

HTTP request

Request line

METHOD /path HTTP/version

Headers

Host, Authorization, Content-Type, Cache-Control...

Blank line

Separates headers from the body

Optional body

JSON, HTML, or binary data

HTTP response

Status line

HTTP/version status-code reason

Headers

Content-Type, Cache-Control, ETag, Set-Cookie...

Blank line

Separates headers from the body

Optional body

API payload, HTML page, file, or data stream

Lifecycle of an HTTP request

Request preparation

Before anything hits the wire, the client resolves the name through DNS, picks an endpoint, and assembles method, path, headers, and an optional body. That preparation shapes both the first latency hop and whether the request reaches the right service.

Path traversal

From there the request rides through a load balancer, proxy, or API gateway and only then meets the service and its dependencies. Each hop adds its own latency, its own rules, and its own way to fail.

Response and connection reuse

The client receives status and data, applies cache rules, and reuses the open connection whenever it can — cold handshakes are expensive, and a good client avoids them.

What an HTTP exchange looks like

The same model returns in REST APIs, edge gateways, and almost every synchronous integration — anywhere a client waits for a concrete answer to a concrete request.

HTTP request ↔ response

HTTP is built around a clear pair: request from the client and response from the server.

Message structure

Method (GET, POST, PUT)
URI or resource path
Headers
Request body (optional)

ClientServer

Client

REQ

Server

The client sends a request without keeping protocol-level state on the server.

How HTTP behaves under load

Step through how cache-hit rate, error rate, and p95 latency move as traffic grows.

StepInterval 1 (1 of 6)

p95 latency (ms)Error rate (%)Cache hit (%)

Phase

Stable load

Load

2.4k RPS

p95 latency

85 ms

Error rate

0.2%

Cache hits

76.0%

Connection reuse

92.0%

Mitigation: Baseline keep-alive

What is happening: Caching and connection reuse keep latency low and predictable.

Abbreviations

RPS (requests per second) — number of HTTP requests served per second.
p95 — response time threshold under which 95% of requests complete.

What the metrics mean

Connection reuse — share of requests served on existing persistent connections.
Cache hit — share of responses returned without expensive backend processing.

Related chapter

Load Balancing

HTTP traffic rarely flows straight to its destination: L7/L4 balancers and a layer of routing and access policy almost always sit between client and service.

Open chapter

How network and routing affect HTTP

Connection reuse

Cold connections and extra handshakes raise p95/p99 even when the application itself is fast. On the charts this often shows up as mysterious latency tails with no obvious cause.

Cache hit share

A drop in cache hit ratio instantly shifts traffic onto dependencies — user-facing latency degrades before any backend alert has a chance to fire.

Timeout and retry policy

Aggressive timeouts and retries are quite capable of manufacturing their own overload: a single dependency hiccup propagates across the call graph and turns into a full incident.

L4 and L7 balancing

The balancer changes latency, the shape of the request path, and client flow behavior. It shows up most clearly when traffic needs stickiness or content-aware routing.

MTU, loss, and protocol version

Packet loss and unstable links hit HTTP/2 and HTTP/3 in very different ways — the difference is most visible on mobile and long-haul paths.

Source

Evolution of HTTP (MDN)

Key HTTP/1.1, HTTP/2, and HTTP/3 milestones and the engineering trade-offs each version is built on.

Перейти на сайт

HTTP evolution

Each new version addresses an already-visible pain of the previous one: it cuts latency and tries to make behavior predictable where too many requests start competing on a single connection.

HTTP/1.1

1997/1999

Text protocol over TCP

OSI mapping: Application layer semantics over TCP.

Persistent connections and chunked transfer
Head-of-line blocking within a single TCP connection
Often needs multiple connections to the same host

HTTP/2

2015

Binary framing and multiplexing

OSI mapping: The same application semantics with more efficient transport over TCP.

Multiple streams inside one connection
Header compression with HPACK
Lower connection setup overhead

HTTP/3

2022

HTTP over QUIC (UDP)

OSI mapping: The same application semantics over QUIC, where packet loss blocks neighboring streams less aggressively.

Faster connection establishment and recovery
Lower transport-level head-of-line impact
Better behavior in mobile and unstable networks

Where HTTP matters most

Public and internal APIs, including REST and browser-facing gRPC proxies
Web applications, SPAs, and SSR frontends
BFF and API gateway layers that compose responses from multiple services
Synchronous service-to-service calls that need predictable request-response flow
Edge access layers for authorization, rate limiting, and observability

Why this matters for system design

HTTP defines API contracts — and along with them latency, retry behavior, and request-processing cost.
Version choice changes the load profile: connection count and character, multiplexing, resilience under packet loss.
A careful cache, timeout, and retry strategy shrinks the blast radius of an incident and shields dependencies from a request avalanche.
HTTP metrics usually light up first — they are the earliest signal that something is going wrong on the user side.

Common mistakes

Treating HTTP as nearly free and skipping the cost of connections, caches, and retries in capacity planning — at peak that is exactly what eats the headroom.

Applying one timeout and retry policy to every method and endpoint without splitting it by SLA, idempotency, and how critical the operation actually is.

Skipping cache semantics like ETag and Cache-Control and, request after request, pushing avoidable load onto services that could have answered from the client side.

Folding 4xx and 5xx into one metric: that view drowns out the earliest signal of server-side degradation.

Related chapters

OSI model - helps locate HTTP at the application layer and connects it to transport and network behavior.
IPv4 and IPv6: evolution of IP addressing - explains how addressing and routing shape HTTP request paths and their latency.
TCP protocol - covers the main transport under HTTP/1.1 and HTTP/2 — and the source of most request delays along the way.
UDP protocol - shows the transport foundation under QUIC and HTTP/3, where delay and packet loss take the front seat.
Domain Name System (DNS) - is a reminder that every HTTP request starts with name resolution and rides on the quality of DNS.
WebSocket protocol - shows how, after an HTTP Upgrade, request-response turns into a long-lived bidirectional channel.
Load Balancing - breaks down L7 routing, sticky sessions, and the ways a balancer reshapes HTTP behavior.
Remote call approaches - helps compare application protocols and tune timeout and retry policy at service boundaries.
Why distributed systems and consistency matter - moves the discussion from HTTP mechanics into the trade-offs without which a distributed architecture does not hold together.