This chapter is useful because it presents HTTP evolution as a sequence of trade-offs around simplicity, caching, connection cost, and behavior under load.
In real engineering work, it helps you choose between HTTP/1.1, HTTP/2, and HTTP/3 based on traffic shape, reason about idempotency and caching semantics, and avoid confusing API design with transport behavior.
In interviews and design discussions, it gives you a structured language for discussing web-system performance and protocol-level trade-offs rather than only talking about endpoints.
Practical value of this chapter
Protocol to product
Connects HTTP behavior to UX metrics: latency, retries, caching, and API stability.
Version-aware design
Supports HTTP/1.1 vs HTTP/2 vs HTTP/3 decisions by traffic and network profile.
Performance tactics
Applies keep-alive, compression, caching, and multiplexing deliberately instead of by habit.
Interview articulation
Provides clear structure for discussing protocol-level web optimization trade-offs.
RFC
RFC 9110 (HTTP Semantics)
Current HTTP semantics specification: methods, status codes, headers, and cache behavior.
HTTP matters not only as the language of the web, but as the contract that defines how clients ask for data and how services answer. Version choice, caching, and retry policy directly shape latency, resilience, and the cost of every request.
In system design, HTTP matters because it is the default request-response contract between clients and services. It is stateless by design, so performance and resilience have to be assembled from caching, token-based identity, database state, and application logic.
Request behavior is shaped by persistent connections, keep-alive, cache control, entity tags, and the chosen timeout and retry policy. Those choices directly influence latency, throughput, and the real processing cost of the path.
Under growth, multiplexing, cache-hit ratio, and head-of-line blocking start to matter immediately. At the same time, the request path often runs through load balancing, a CDN, or an API gateway, so the choice between HTTP/1.1, HTTP/2, and HTTP/3 is really a choice about network conditions and traffic shape.
Core properties of HTTP
Client-server interaction
The client sends an HTTP request, and the server returns a status code, headers, and optionally a body.
Stateless processing
Each request is handled independently, so state is pushed into caches, databases, tokens, and sessions.
Header-driven behavior
Headers define caching, content type, authorization, and security policy.
Intermediary layers
Proxies, CDNs, and API gateways help scale delivery, protect services, and reduce latency.
Version evolution
HTTP/1.1, HTTP/2, and HTTP/3 change runtime behavior under load while keeping the same application semantics.
How an HTTP message is structured
Regardless of version, HTTP keeps the same basic shape: a request or status line, headers, a blank line, and a body when one is needed.
HTTP request
Request line
METHOD /path HTTP/version
Headers
Host, Authorization, Content-Type, Cache-Control...
Blank line
Separates headers from the body
Optional body
JSON, HTML, or binary data
HTTP response
Status line
HTTP/version status-code reason
Headers
Content-Type, Cache-Control, ETag, Set-Cookie...
Blank line
Separates headers from the body
Optional body
API payload, HTML page, file, or data stream
Lifecycle of an HTTP request
Request preparation
The client resolves the name through DNS, chooses an endpoint, and builds method, path, headers, and an optional body.
Path traversal
The request travels through a load balancer, proxy, or API gateway before it reaches the service and its dependencies.
Response and connection reuse
The client receives status and data, applies cache rules, and reuses the open connection whenever it can.
What an HTTP exchange looks like
The same model shows up across REST APIs, edge gateways, and most synchronous integrations where a client waits for a concrete answer to a concrete request.
HTTP request ↔ response
HTTP is built around a clear pair: request from the client and response from the server.
Message structure
- Method (GET, POST, PUT)
- URI or resource path
- Headers
- Request body (optional)
How HTTP behaves under load
Step through how cache-hit rate, error rate, and p95 latency move as traffic grows.
Phase
Stable load
Load
2.4k RPS
p95 latency
85 ms
Error rate
0.2%
Cache hits
76.0%
Connection reuse
92.0%
Mitigation: Baseline keep-alive
What is happening: Caching and connection reuse keep latency low and predictable.
Abbreviations
- RPS (requests per second) — number of HTTP requests served per second.
- p95 — response time threshold under which 95% of requests complete.
What the metrics mean
- Connection reuse — share of requests served on existing persistent connections.
- Cache hit — share of responses returned without expensive backend processing.
Related chapter
Load Balancing
HTTP traffic almost always crosses L7/L4 balancers and policy layers before it reaches the service.
How network and routing affect HTTP
Connection reuse
Cold connections and extra handshakes raise p95/p99 even when the application itself is fast.
Cache hit share
As cache misses grow, more traffic reaches dependencies and user-facing latency deteriorates quickly.
Timeout and retry policy
Overly aggressive timeouts and retries can manufacture overload and spread the incident across dependencies.
L4 and L7 balancing
The balancer changes latency, request path shape, and client behavior, especially when flows need stickiness.
MTU, loss, and protocol version
Packet loss and unstable links affect HTTP/2 and HTTP/3 differently, especially in mobile scenarios.
Source
Evolution of HTTP (MDN)
Key HTTP/1.1, HTTP/2, and HTTP/3 milestones and the engineering trade-offs behind each version.
HTTP evolution
Protocol evolution has focused on lower latency and more predictable behavior under heavy request concurrency.
HTTP/1.1
1997/1999Text protocol over TCP
OSI mapping: Application layer semantics over TCP.
- Persistent connections and chunked transfer
- Head-of-line blocking within a single TCP connection
- Often needs multiple connections to the same host
HTTP/2
2015Binary framing and multiplexing
OSI mapping: The same application semantics with more efficient transport over TCP.
- Multiple streams inside one connection
- Header compression with HPACK
- Lower connection setup overhead
HTTP/3
2022HTTP over QUIC (UDP)
OSI mapping: The same application semantics over QUIC, where packet loss blocks neighboring streams less aggressively.
- Faster connection establishment and recovery
- Lower transport-level head-of-line impact
- Better behavior in mobile and unstable networks
Where HTTP matters most
- Public and internal APIs, including REST and browser-facing gRPC proxies
- Web applications, SPAs, and SSR frontends
- BFF and API gateway layers that compose responses from multiple services
- Synchronous service-to-service calls that need predictable request-response flow
- Edge access layers for authorization, rate limiting, and observability
Why this matters for system design
- HTTP defines API contracts and directly affects latency, retries, and request-processing cost.
- Version choice changes connection behavior, multiplexing, and resilience under packet loss.
- A sound cache, timeout, and retry strategy reduces blast radius and protects dependencies.
- HTTP metrics often provide the earliest signal of user-visible degradation.
Common mistakes
Treating HTTP as nearly free and ignoring the cost of connections, caching, and retries in capacity planning.
Using the same timeout and retry policy for every method and endpoint regardless of SLA, idempotency, and business criticality.
Ignoring cache semantics such as ETag and Cache-Control and pushing avoidable load onto services.
Combining 4xx and 5xx into one metric and losing an early signal of server-side degradation.
Related chapters
- OSI model - shows where HTTP sits at the application layer and how it connects to transport and network behavior.
- IPv4 and IPv6: evolution of IP addressing - explains how addressing and routing reshape HTTP request paths and latency.
- TCP protocol - covers the main transport underneath HTTP/1.1 and HTTP/2 and the source of many request delays.
- UDP protocol - shows the transport foundation for QUIC and HTTP/3, where delay and packet loss matter most.
- Domain Name System (DNS) - reminds you that every HTTP request starts with name resolution and depends on DNS quality.
- WebSocket protocol - shows how a long-lived bidirectional channel appears after HTTP Upgrade.
- Load Balancing - breaks down L7 routing, sticky sessions, and balancing effects on HTTP behavior.
- Remote call approaches - helps compare application protocols and timeout/retry policy at service boundaries.
- Why distributed systems and consistency matter - moves the discussion from HTTP mechanics into distributed-architecture trade-offs.
