This chapter is useful because it shows HTTP evolution as a sequence of trade-offs around simplicity, caching, multiplexing, and the cost of connections.
In real engineering work, it helps you choose between HTTP/1.1, HTTP/2, and HTTP/3 based on traffic shape, reason about idempotency and caching semantics, and avoid confusing API design with transport behavior.
In interviews and design reviews, it gives you a structured language for discussing web performance and protocol-level trade-offs rather than only talking about endpoints.
Practical value of this chapter
Protocol to product
Connects HTTP behavior to UX metrics: latency, retries, caching, and API stability.
Version-aware design
Supports HTTP/1.1 vs HTTP/2 vs HTTP/3 decisions by traffic and network profile.
Performance tactics
Applies keep-alive, compression, caching, and multiplexing with explicit intent.
Interview articulation
Provides clear structure for discussing protocol-level web optimization trade-offs.
RFC
RFC 9110 (HTTP Semantics)
Current HTTP semantics spec: methods, status codes, headers, and cache behavior.
HTTP is the primary application protocol for web and API integration. It defines request-response contracts between clients and services, and version/caching/retry choices directly impact latency and reliability.
Core HTTP properties
Client-server model
Client initiates an HTTP request, and server returns a response with status and payload.
Stateless interaction
Each request is independent; state lives in caches, databases, tokens, and sessions.
Header-based extensibility
Headers control caching, content negotiation, authorization, and security policies.
Intermediate nodes
Proxies, CDNs, and gateways help scale delivery and reduce latency.
Transport evolution
HTTP/1.1, HTTP/2, and HTTP/3 change performance profile while keeping app-level semantics.
HTTP message content visualization
Regardless of protocol version, HTTP interaction remains centered around request and response structure.
HTTP Request
Request line
METHOD /path HTTP/version
Headers
Host, Authorization, Content-Type, Cache-Control...
Empty line
Header/body separator
Body (optional)
JSON/HTML/binary payload
HTTP Response
Status line
HTTP/version status-code reason
Headers
Content-Type, Cache-Control, ETag, Set-Cookie...
Empty line
Header/body separator
Body (optional)
API response, page, file, stream
HTTP request lifecycle
Request preparation
Client resolves DNS, picks endpoint, and builds method/path/headers and optional body.
Delivery and processing
Request traverses LB/proxy/gateway and is processed by service plus downstream dependencies.
Response and reuse
Client receives status/data, applies cache policy, and reuses persistent connections.
How HTTP exchange looks in practice
Base request-response model stays common across REST APIs, edge gateways, and most synchronous integrations.
HTTP request ↔ response
HTTP is a clear pair: request from the client and response from the server.
Message structure
- Method (GET, POST, PUT)
- URI / resource path
- Headers
- Body (optional)
HTTP dynamics under load
Step through how cache hit, error rate, and p95 latency evolve as traffic grows.
Phase
Stable load
Load
2.4k RPS
p95 latency
85 ms
Error rate
0.2%
Cache hit
76.0%
Connection reuse
92.0%
Mitigation: Baseline keep-alive
What is happening: Caching and connection reuse keep latency low and predictable.
Abbreviations
- RPS (requests per second) — number of HTTP requests served per second.
- p95 — response time threshold under which 95% of requests complete.
Metric decoding
- Connection reuse — share of requests served on existing persistent connections.
- Cache hit — share of responses returned without expensive backend processing.
Related chapter
Load Balancing
HTTP traffic almost always traverses L7/L4 balancing and policy layers.
How network and routing affect HTTP
Connection reuse and warm-up
Cold connections and extra handshakes increase p95/p99 latency even with fast backends.
Cache hit ratio
Falling cache hit quickly increases backend pressure and error rate at peak traffic.
Timeout/retry policy
Aggressive retries without budgets can trigger self-induced congestion and cascading failures.
L4/L7 balancing path
Proxy/LB routing changes latency profile and sticky behavior for stateful client flows.
MTU, loss, and protocol version
Loss and unstable network affect HTTP/2 and HTTP/3 differently, especially in mobile scenarios.
Source
Evolution of HTTP (MDN)
Major milestones of HTTP/1.1, HTTP/2, and HTTP/3 with engineering trade-offs.
HTTP evolution
Protocol evolution has focused on lower latency and better behavior under high request concurrency.
HTTP/1.1
1997/1999Text protocol over TCP
OSI mapping: OSI L7 over TCP (L4).
- Persistent connections and chunked transfer
- Head-of-line limits within connection
- Often requires multiple client connections
HTTP/2
2015Binary framing and multiplexing
OSI mapping: Same L7 semantics, more efficient transport on top of TCP.
- Multiplexing streams on one connection
- Header compression with HPACK
- Reduced connection setup overhead
HTTP/3
2022HTTP over QUIC (UDP)
OSI mapping: HTTP L7 over QUIC transport with lower blocking under packet loss.
- Faster handshake and recovery
- Avoids TCP head-of-line in transport
- Better behavior in mobile networks
Where HTTP is most used
- Public and internal APIs (REST/gRPC-web/proxy patterns)
- Web apps and SPA/SSR frontends
- BFF and API Gateway orchestration layer
- Synchronous service integration with request-response semantics
- Edge access control, observability, and rate limiting
Why this matters in System Design
- HTTP defines API contracts and directly affects latency, retries, and request-processing cost.
- Protocol version choice changes connection behavior, multiplexing, and resilience under packet loss.
- Correct cache/timeout/retry strategy lowers incident blast radius and protects backend systems.
- HTTP metrics usually provide the fastest signal of user-facing degradation.
Common mistakes
Treating HTTP as 'free' and ignoring transport overhead, caching, and retries in capacity planning.
Reusing one timeout/retry profile for all endpoints regardless of SLA and criticality.
Ignoring cache semantics (ETag, Cache-Control) and generating avoidable backend load.
Combining 4xx and 5xx into one metric and losing clear server-degradation signals.
Related chapters
- OSI model - maps HTTP to the application layer and links it to transport/network behavior.
- IPv4 and IPv6: evolution of IP addressing - how routing properties influence HTTP delivery and latency profile.
- TCP protocol - main transport foundation for HTTP/1.1 and HTTP/2 performance characteristics.
- UDP protocol - transport foundation for QUIC/HTTP/3 and low-latency behavior.
- Domain Name System (DNS) - every HTTP call depends on DNS resolution and related latency/stability trade-offs.
- WebSocket protocol - upgrade from HTTP to bidirectional realtime communication channel.
- Load Balancing - L7 routing, sticky sessions, and balancing effects on HTTP traffic behavior.
- Remote call approaches - protocol choice and timeout/retry policy at app-level communication boundaries.
- Why distributed systems and consistency matter - moves from HTTP mechanics to distributed architecture trade-offs.
