CAP matters not as a classroom triangle, but as a reminder of an ugly fact: once the network partitions, a clean architecture turns into a forced choice between conflicting properties.
In practice, this chapter helps teams leave the abstract debate behind and decide what will be lost first: parts of writes, parts of reads, parts of the user flow, or operational predictability.
In interviews and architecture reviews, it is strongest when you focus not on the definition of CAP, but on the consequences for APIs, UX, service degradation, and recovery rules under network faults.
Practical value of this chapter
Design in practice
Helps choose CP or AP behavior by domain scenario instead of treating the decision as pure theory.
Decision quality
Clarifies when stale answers are acceptable and when freshness is critical to the business.
Interview articulation
Makes it easier to explain why a specific trade-off fits the workload in front of you.
Risk and trade-offs
Makes partition consequences explicit: API degradation, user impact, and operational guardrails.
Original
Telegram: Book Cube
Original post with a walkthrough of the CAP theorem.
The CAP theorem is more than twenty-five years old, yet it is still often reduced to the shallow slogan of “pick two out of three.” Using Eric Brewer's own retrospective, this chapter separates the original idea from the meme and connects it back to practical distributed-system behavior.
Foundation
TCP protocol
Network partitions and latency directly shape the system behavior discussed by CAP.
What the CAP theorem says
The classical formulation describes the limits of a distributed system at the moment a network partition appears. It does not say that the whole system permanently picks two properties forever. It says that during a loss of connectivity, the system must make a concrete choice about what it will protect first.
"The CAP theorem states that any networked shared-data system can have at most two of three desirable properties"
Reads observe one up-to-date state, as if the system were working from a single coherent copy of the data.
Every request to a healthy node receives a response, even if the answer may not reflect the newest state.
The system keeps operating even when one part of the cluster temporarily loses contact with another part of the network.
Foundation
OSI model
Useful for reasoning about the network layers and the mechanisms that can produce partitions.
CAP visualization
Interactive CAP theorem diagram
Click a property or system type to inspect how the trade-offs connect.
Linearizable view of data
Responds to every request
Keeps working through network splits
System types
Important nuance
In real distributed systems partition tolerance is not optional. Network partitions are inevitable, so the practical question is how the system behaves on the CP/AP spectrum.
Presentation
PODC 2000
Eric Brewer's original symposium presentation.
How the theorem emerged
Eric Brewer formulates the idea that will later become known as the CAP theorem.
The idea enters wider technical discussion through "Harvest, Yield, and Scalable Tolerant Systems".
Brewer presents the thesis at the Symposium on Principles of Distributed Computing.
Seth Gilbert and Nancy Lynch formally prove the theorem and sharpen the meaning of consistency through linearizability.
Related chapter
DDIA
Adds the deeper context on consistency, replication, and the operational consequences of network failures.
Common misconceptions
CAP is often remembered as “pick two out of three,” but Brewer points out several important corrections that make the theorem far more useful in practice.
Partitions are rare, but not optional
Most of the time a system can aim to preserve both consistency and availability. The theorem becomes relevant the moment connectivity breaks and the system must choose what to protect first.
The trade-off is local, not global
The choice is made per operation, per data path, or per user flow. One part of the system may insist on fresh reads while another can survive on stale data.
The properties are not binary switches
Availability is measured across a range, consistency comes in multiple levels, and partitions may look like an outright disconnection or simply an intolerably slow network.
What to do during a network partition
In normal operation the system tries to preserve both consistency and availability. Once connectivity breaks, it needs a clear failure-mode playbook.
Detect the problem
Recognize that communication between parts of the cluster is lost or too unstable.
Constrain operations
Decide explicitly which reads and writes should be rejected, degraded, or simplified.
Recover the state
Reconcile data, resolve conflicts, and run compensations after connectivity returns.
Related chapter
Database Internals
Provides the deeper background on transactions, isolation, and storage behavior.
How CAP relates to ACID and BASE
CAP is often mixed together with ACID and BASE. The comparison is useful, but the three concepts answer different questions.
ACID
In ACID, consistency is about business invariants and transaction correctness, not the distributed-systems meaning used by CAP.
BASE
BASE describes an architectural style that favors availability and accepts temporary divergence between copies of data.
Why latency still matters
Latency is not named explicitly in the original theorem, but in practice it is how a system discovers that the network is behaving like a partition. The real decision is often made through timeouts and response deadlines.
Reject the operation
Reduce availability in order to preserve stricter consistency.
Continue the operation
Preserve availability while accepting the risk of stale or divergent data.
Key insight
From an engineering perspective, a partition often shows up as a timeout. The stricter the response-time boundary, the more often the system will behave as if the network were partitioned, even if the real problem is “just” a very slow link.
What to remember
- A network partition is not a global flag: different nodes may observe the situation differently.
- During a partition, the system is not choosing “two properties out of three” in the abstract. It is choosing a degradation strategy for a specific operation.
- Aggressive timeouts make the system more sensitive to slow networks, even when there is no physical disconnection.
- CAP is useful when it helps reason about API behavior, user impact, and recovery procedures, not as a decorative diagram.
Original
Telegram: Book Cube
Post covering the formal proof of the CAP theorem.
Formal proof
In 2002, Seth Gilbert and Nancy Lynch published the paper that turned Brewer's conjecture into a formal theorem.
"It is impossible for a web service to provide the following three guarantees: consistency, availability, partition-tolerance"
How the properties are formalized
1. Consistency
The authors define it through linearizability: all operations look as if they happened instantaneously in one total order between invocation and response.
"Under this consistency guarantee, there must exist a total order on all operations such that each operation looks as if it were completed at a single instant."
In other words, the system behaves as though it were running on one node and processing operations one at a time, even if multiple replicas exist internally.
2. Availability
Every request that reaches a non-failing node must complete with a response. This is formally a weak definition because there is no upper bound on response time. Yet under partitions it becomes strong in practice: even with lost messages, the request is still expected to terminate.
3. Partition tolerance
The model allows an arbitrary number of messages to be lost between parts of the cluster:
"When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost."
Proof
1Theorem for asynchronous systems
Theorem 1:
It is impossible in the asynchronous network model to implement a read/write data object that guarantees Availability and Atomic consistency in all fair executions (including those in which messages are lost).
In an asynchronous system, availability and atomic consistency cannot both be guaranteed for all fair executions if message loss is possible.
The contradiction argument is straightforward:
- Assume such an algorithm exists.
- Consider a system with two nodes, G₁ and G₂.
- Assume all messages between them are lost.
- A write happens on G₁.
- A later read happens on G₂.
- G₂ cannot return the value written on G₁.
- The system violates either consistency or availability.
2Theorem for partially synchronous systems
Theorem 2:
It is impossible in the partially synchronous network model to implement a read/write data object that guarantees Availability and Atomic consistency in all executions (even those in which messages are lost).
Real systems are closer to partially synchronous ones: nodes have clocks, timeouts, and operational expectations about delay. Even in that stronger model, the result remains the same.
Note: The rest of the paper discusses Delayed-t consistency in partially synchronous systems, a compromise that relaxes freshness guarantees.
Related chapters
- Why distributed systems and consistency matter - Introduces the section and explains why consistency, availability, and failure-handling trade-offs appear in real systems.
- Scalable system design principles - Shows how CAP influences sharding, replication, and service-degradation choices in practical scaling work.
- PACELC theorem - Extends the CAP discussion into normal operation and explains latency-versus-consistency choices outside partition events.
- Designing Data-Intensive Applications, 2nd Edition (short summary) - Connects CAP to replication, consistency models, and the real architecture of distributed systems.
- Distributed Systems, 4th Edition (short summary) - Provides the academic foundation for failure models, communication, and coordination in distributed environments.
- Database Internals: A Deep Dive (short summary) - Shows how storage and replication internals embody CAP and PACELC trade-offs at implementation level.
- Cassandra: The Definitive Guide (short summary) - Offers a practical AP/PA-EL example with tunable consistency and explicit consistency-control choices.
- Multi-region / Global Systems - Extends the discussion to cross-region replication, global routing, and recovery procedures.
