TCP protocol — System Design Space

This chapter matters because it makes the cost of reliable delivery visible: connection setup, byte ordering, flow control, and congestion control are always paid for in latency, memory, and state.

In practice, it helps you design retries, connection reuse, streaming, and backpressure with a clear sense of how the transport layer reshapes application latency and throughput.

In interviews and design discussions, it lets you explain transport choices and load behavior through actual TCP mechanisms rather than vague claims about reliability.

Practical value of this chapter

Reliable delivery

Builds intuition for acknowledgments, retransmissions, and the cost of preserving order.

Congestion behavior

Shows how congestion control shapes data-flow decisions and retry policy.

Connection lifecycle

Shows how handshake, keep-alive, and queueing shape request-path performance.

Interview depth

Strengthens transport-layer discussion in reliability-critical design cases.

RFC

RFC 9293 (TCP)

The current TCP source of truth: segment format, connection states, and protocol behavior — worth citing instead of older overview articles.

Перейти на сайт

The label “reliable transport” covers only half the story. In practice, TCP is the layer that turns network quality into concrete latency, memory cost, connection price, and service behavior under load.

In system design, it helps to understand how TCP creates connection state, why it relies on ACKs, and how retransmission, flow control, and congestion control shape service stability before application logic even gets involved.

Real transfer speed depends on MSS, cwnd, the receive window, RTT, and bandwidth-delay product. That is why nominal link bandwidth alone does not guarantee high throughput or low latency.

For architects, this turns into practical choices: when to keep connections warm, how to tune timeout and retry policy, where to apply backpressure, and why slow start or head-of-line blocking can quietly distort the user-visible request path.

Key properties of TCP

Reliable delivery

The byte stream stays in order: segments are acknowledged, lost ones get retransmitted — without any help from the application.

Connection state

No payload moves until both peers agree on shared connection state and basic session parameters. That negotiation is the price of a predictable stream.

Acknowledgments and retransmissions

ACKs give the sender feedback on delivery and let the stream recover from network disruption before the user notices anything.

Flow control

The receive window keeps the sender from drowning the receiver's buffers — as much fits as fits, no more.

Congestion control

Pace is set by the network, not the sender: losses, latency, explicit marks. Pushing max speed regardless breaks the neighbors on the same path.

How the TCP segment header is structured

Nothing in the header is decorative: each field carries a specific mechanism — ordering, acknowledgments, window negotiation, error checking, and the reaction to loss.

TCP segment header (base)

20-60 bytes

Source Port

16 bits

Destination Port

16 bits

Sequence Number

32 bits

Acknowledgment Number

32 bits

Data Offset

4 bits

Reserved

4 bits

Flags

8 bits

Window Size

16 bits

Checksum

16 bits

Urgent Pointer

16 bits

Options + Padding (optional)

32 bits

The base header usually takes 20 bytes. Options such as MSS, Window Scale, SACK Permitted, and Timestamps can extend it to 60 bytes.

TCP connection lifecycle

Connection setup

SYN -> SYN-ACK -> ACK. Peers agree on initial sequence numbers, MSS, and window settings — none of the actual payload moves until that is done.

Data transfer

The byte stream is sliced into segments, ACKs confirm delivery, the receive window breathes with the receiver's buffers, lost segments go back on the wire.

Connection teardown

FIN and ACK travel in both directions; the active side sits in TIME_WAIT. That is insurance — so a stale segment cannot attach itself to the next connection on the same port pair.

The TCP three-way handshake

The handshake is not just “opening a socket” — it synchronizes sequence numbers, confirms the other side is reachable, and creates connection state. That state later costs latency on every new request and memory on the server.

TCP three-way handshake

Select a step or use the controls to replay the connection setup.

State

Click "Start" to see all TCP connection setup steps.

ClientServer

Client

Server

After all three steps the connection becomes established.

Once the connection is up, TCP starts trading off — push for more speed, or back off because the network is already signaling stress. This is the phase where you can see what you are actually hitting: the receiver, the queues along the way, or raw link capacity.

Flow and congestion control dynamics

Step through RTT stages to watch how cwnd and rwnd evolve over time.

StepRTT 1 (1 of 8)

cwndrwnd

Congestion window (cwnd)

2 MSS

Receive window (rwnd)

18 MSS

Effective window (min)

2 MSS

Queue pressure

Higher queue occupancy raises tail latency even when packet loss remains modest.

RTT

18 ms

Loss

0.0%

Estimated throughput

1.3 Mbps

MSS = 1460B

Phase: Slow start

What is happening in this step: Connection just started: sender ramps up cwnd quickly while the path is still uncongested.

Term decoding

cwnd = congestion window - Sender-side window: adjusted by congestion signals and limits in-flight data volume.
rwnd = receive window - Receiver-advertised window in ACKs that reflects available buffer capacity.
send window = min(cwnd, rwnd) - Effective sending limit is always the smaller value of these two windows.

Related chapter

IPv4 and IPv6: evolution of IP addressing

Why IP routing properties set a ceiling on TCP throughput and stability — usually more than people expect.

Open chapter

How routing and network conditions shape TCP behavior

RTT and BDP

The higher the RTT and bandwidth-delay product, the wider the effective window has to be — otherwise the link sits idle waiting on acknowledgments.

Loss and queue growth

Every loss and queue buildup shrinks cwnd. Tail latency rises, recovery stretches across many RTTs.

Asymmetric routing

Forward and reverse paths often diverge: segments arrive out of order, duplicate ACKs pile up, and TCP sees loss signals that are not really loss.

MTU and fragmentation

Broken Path MTU Discovery means fragmentation, blackholes, and heavy retransmission loops — in service logs this just shows up as “sometimes it hangs.”

Idle timeouts in NAT and load balancers

Long-lived TCP sessions get silently cut by middleboxes. The service sees a connection drop and blames the network, when the real cause is a NAT or load-balancer policy.

Why this matters in System Design

Handshake cost and connection warm-up land directly in p95 and p99 at the service entry point — they are the first thing the user notices.
Real throughput is set by window size, RTT, and congestion control, not by the nominal link bandwidth on the provider contract.
Under load, sloppy timeout and retry policy turns a single degradation into a cascading failure faster than the alert can fire.
When transport behavior is understood, separating a network incident from a business-logic bug becomes a matter of minutes instead of hours.

Common mistakes

Treating the transport layer as stable by default and looking past tail latency on network segments.

Lumping congestion signals and application timeouts into one metric — later, nobody can tell which one actually broke.

Skipping connection pools and keep-alive: every request pays for a handshake, p95 bloats for no reason.

Using identical timeout and retry policies for intra-DC and inter-region traffic — a reliable shortcut to cascading failures.

Related chapters

OSI model - places TCP at the transport layer and gives you a way to debug network incidents layer by layer instead of guessing.
IPv4 and IPv6: evolution of IP addressing - unpacks how IP routing and the addressing stack underneath TCP change its behavior under load.
UDP protocol - the alternative when TCP's built-in guarantees cost more than the application can afford — and those guarantees move into application code.
HTTP protocol - the upper floor over TCP in most services: shows how an application protocol reuses connections and where it amplifies transport-layer bottlenecks.
WebSocket protocol - the long-lived TCP case, where NAT drop-offs and per-connection memory on the server become the dominant concern.
Load Balancing - L4/L7 balancing, stickiness, and middlebox timeouts — three places where TCP sessions break silently from the application's point of view.
Remote call approaches - the application-layer counterpart: timeouts, retries, and backoff that should work with TCP's behavior instead of fighting it.
Why distributed systems and consistency matter - the next step: from a single connection's mechanics to the trade-offs an architect makes across the whole distributed system.