CAP theorem — System Design Space

CAP matters not as a classroom triangle, but as a reminder of an ugly fact: once the network partitions, a clean architecture turns into a forced choice between conflicting properties.

In practice, this chapter helps teams leave the abstract debate behind and decide what will be lost first: parts of writes, parts of reads, parts of the user flow, or operational predictability.

In interviews and architecture reviews, it is strongest when you focus not on the definition of CAP, but on the consequences for APIs, UX, service degradation, and recovery rules under network faults.

Practical value of this chapter

Design in practice

Helps choose CP or AP behavior by domain scenario instead of treating the decision as pure theory.

Decision quality

Clarifies when stale answers are acceptable and when freshness is critical to the business.

Interview articulation

Makes it easier to explain why a specific trade-off fits the workload in front of you.

Risk and trade-offs

Makes partition consequences explicit: API degradation, user impact, and operational guardrails.

Original

Telegram: Book Cube

Original post with a walkthrough of the CAP theorem.

Перейти на сайт

The CAP theorem is more than twenty-five years old, and it is still pitched as the game of “pick two out of three.” Using Eric Brewer's own retrospective, this chapter separates the original idea from the meme and shows why CAP still matters when you are deciding on replication, quorums, or API behavior during an outage.

Foundation

TCP protocol

Where connectivity breaks and latency accumulates — the material the “P” in CAP is made of.

Читать обзор

What the CAP theorem says

The classical formulation describes the limits of a distributed system precisely at the moment a network partition appears. It is not a permanent label on the whole architecture, and it is not a lifelong choice between two properties out of three. It is a reminder: while one part of the cluster cannot see another, you have to decide explicitly what you are willing to give up right here and now.

"The CAP theorem states that any networked shared-data system can have at most two of three desirable properties"

CConsistency

Any read sees the same up-to-date state — as if the system only ever had a single copy of the data.

AAvailability

Every request to a healthy node gets a response. Freshness is not promised — what matters is that an answer comes back at all.

PPartition tolerance

The system keeps operating even when one part of the cluster temporarily loses contact with another part of the network — instead of going down as a whole.

Foundation

OSI model

Helps walk through the network layers and see at which one the thing we later call a partition is actually born.

Читать обзор

CAP visualization

Interactive CAP theorem diagram

Click a property or system type to inspect how the trade-offs connect.

Consistency

Linearizable view of data

Availability

Responds to every request

Partition tolerance

Keeps working through network splits

System types

Important nuance

In real distributed systems partition tolerance is not optional. Network partitions are inevitable, so the practical question is how the system behaves on the CP/AP spectrum.

Presentation

PODC 2000

Eric Brewer's original symposium presentation.

Перейти на сайт

How the theorem emerged

1998

Eric Brewer formulates the idea that will later become known as the CAP theorem.

1999

The idea enters wider technical discussion through "Harvest, Yield, and Scalable Tolerant Systems".

2000

Brewer presents the thesis at the Symposium on Principles of Distributed Computing.

2002

Seth Gilbert and Nancy Lynch formally prove the theorem and sharpen the meaning of consistency through linearizability.

Related chapter

DDIA

Takes consistency and replication down to the operational consequences of network failures — where CAP stops being a triangle.

Читать обзор

Common misconceptions

The “pick two out of three” line is convenient for a slide and poor at describing a real system. Brewer's own retrospective spells out several corrections without which the theorem is easy to apply badly.

Partitions are rare, but not optional

In steady state the system happily holds both consistency and availability. The theorem only kicks in at the moment the network splits — and that is exactly when you want a pre-thought answer rather than improvisation in prod.

The trade-off is local, not global

The choice is made per operation, per data path, or per user flow. Inside one service, some data may insist on fresh reads while other data is content to answer from stale replicas.

The properties are not binary switches

Availability is not a flag, it is a share of successful responses. Consistency comes in levels, and a partition is sometimes an outright cable break and sometimes a link so slow it is indistinguishable from one.

What to do during a network partition

While the network is healthy, the system holds both consistency and availability at once. The moment connectivity breaks, you need a playbook agreed in advance — otherwise the trade-off gets picked by random clients in the middle of an incident.

Detect the problem

Catch the signal that communication between parts of the cluster is lost or too unstable to trust with quorum decisions.

Constrain operations

Decide ahead of time which reads and writes are rejected, simplified, or allowed in degraded mode — instead of leaving the call to the on-call engineer's gut.

Recover the state

Once connectivity returns, the unpleasant part is left: reconcile replicas, resolve conflicts, and run compensations for whatever went through in degraded mode.

Related chapter

Database Internals

Drops into transactions, isolation, and storage internals — where CAP and ACID turn into what they actually mean in practice.

Читать обзор

How CAP relates to ACID and BASE

The word “consistency” means different things in CAP, in ACID, and in BASE, and that's why these discussions regularly go in circles. Lining the three up side by side untangles the meaning before anyone argues about the trade-offs.

ACID

AAtomicity

CConsistency

IIsolation

DDurability

In ACID, consistency is about invariants and transaction correctness, not about freshness across replicas the way CAP uses the word. Sharing the letter C here hurts more than it helps.

BASE

BABasic Availability

SSoft state

EEventual consistency

BASE is no longer about transactions — it is an architectural stance: design the system to hold availability during failures, knowingly tolerating temporary divergence between copies of the data.

Why latency still matters

Latency is not named explicitly in the original theorem, but in practice it is how a system finds out that the network is behaving like a partition. The decision is almost always wired into timeouts: they are what turns a slow link into a “partition” as far as CAP is concerned.

Reject the operation

Give up some availability rather than let an inconsistent state slip into the system.

Continue the operation

Answer the client now and accept up front that data may be stale and replicas will need reconciling later.

Key insight

At the level of actual code, a partition almost always shows up as a timeout. So a timeout value is not a performance knob — it is an engineering decision: the smaller it is, the more often the system declares a “partition” on a slow but otherwise alive link.

What to remember

A network partition is not a global flag: different nodes see the situation differently, and the decision ends up being local.
During a partition, the system does not pick “two properties out of three” in the abstract — it picks a degradation strategy for a specific operation.
Aggressive timeouts raise sensitivity to slow networks: the system declares a “partition” more often, even when nothing has physically broken.
CAP is useful when it drives a conversation about API behavior, user impact, and recovery procedures — not as a decorative triangle on a slide.

Original

Telegram: Book Cube

Post covering the formal proof of the CAP theorem.

Перейти на сайт

Formal proof

In 2002, Seth Gilbert and Nancy Lynch published the paper that turned Brewer's conjecture into a formal theorem — with sharp definitions and an actual proof.

"It is impossible for a web service to provide the following three guarantees: consistency, availability, partition-tolerance"

How the properties are formalized

1. Consistency

Consistency here is defined through linearizability: every operation looks as if it executed instantaneously, in one total order, somewhere between its invocation and its response.

"Under this consistency guarantee, there must exist a total order on all operations such that each operation looks as if it were completed at a single instant."

In other words, from the outside the system looks like a single node processing operations one by one — no matter how many replicas and how much replication sit under the hood.

2. Availability

Every request that reaches a non-failing node has to complete with a response. Formally the definition is weak — there is no upper bound on response time. Under a partition, though, it bites hard: even with lost messages, the request still has to terminate rather than hang forever.

3. Partition tolerance

The model assumes the worst case: an arbitrary number of messages between parts of the cluster can be lost, and the proof has to hold even there.

"When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost."

Proof

1Theorem for asynchronous systems

Theorem 1:

It is impossible in the asynchronous network model to implement a read/write data object that guarantees Availability and Atomic consistency in all fair executions (including those in which messages are lost).

In an asynchronous system, availability and atomic consistency cannot both be guaranteed for all fair executions if message loss is possible.

The contradiction argument is straightforward:

Assume such an algorithm exists.
Consider a system with two nodes, G₁ and G₂.
Assume all messages between them are lost.
A write happens on G₁.
A later read happens on G₂.
G₂ cannot return the value written on G₁.
The system violates either consistency or availability.

2Theorem for partially synchronous systems

Theorem 2:

It is impossible in the partially synchronous network model to implement a read/write data object that guarantees Availability and Atomic consistency in all executions (even those in which messages are lost).

Real systems live closer to partially synchronous ones: nodes have clocks, timeouts, and reasonable expectations about delay. It would be convenient if that rescued us, but the verdict holds: under a partition, you still have to choose between availability and atomic consistency.

Note: the rest of the paper covers Delayed-t consistency for partially synchronous systems — a compromise that deliberately relaxes freshness guarantees in order to win back some availability.

Sources and materials

CAP Twelve Years Later (Eric Brewer)Original post about CAP (Telegram)Post about the CAP proof (Telegram)Gilbert & Lynch (2002)PODC 2000 Presentation Harvest, Yield, and Scalable Tolerant Systems

Related chapters

Why distributed systems and consistency matter - Explains where partial failures come from and why invariants matter — without that, a CAP discussion turns into an argument about letters.
Scalable system design principles - Shows what the CAP choice actually costs at the level of sharding, replication, and service-degradation strategy.
PACELC theorem - Continues the CAP discussion beyond the outage: what a system pays for consistency in steady state, when the network is healthy.
Designing Data-Intensive Applications, 2nd Edition (short summary) - Unfolds CAP into replication, consistency models, and the real architectural choices behind distributed systems.
Distributed Systems, 4th Edition (short summary) - Academic foundation for failure, communication, and coordination models — useful when you want to know where a partition even comes from.
Database Internals: A Deep Dive (short summary) - Drops down to storage and replication, where CAP and PACELC trade-offs turn into specific config lines and write paths.
Cassandra: The Definitive Guide (short summary) - A living AP/PA-EL example: tunable consistency and explicit knobs for picking the trade-off under load.
Multi-region / Global Systems - Moves the CAP discussion into cross-region replication, where hundreds of milliseconds sit between replicas.