Original
Telegram: book_cube
Original post with analysis of the CAP theorem.
The famous CAP theorem is more than 25 years old. Let's figure out what it is, why it is needed and how it appeared. There is a great one about this article by Eric Brewer, the author of the theorem, which I wanted to remember because it is good.
Foundation
TCP protocol
Network splits and latency directly impact CAP trade-offs.
Statement of the theorem
"The CAP theorem states that any networked shared-data system can have at most two of three desirable properties"
Equivalent to having a single, up-to-date copy of the data
High data availability for updates
Resilience to network splits
Foundation
OSI model
Useful for parsing network layers and reasons for partition.
CAP visualization
Interactive CAP Diagram
Click a property or system type to explore relationships.
Atomic or linearizable consistency
Responds to every request
Tolerance to network partitions
System types
Important nuance
In real distributed systems, Partition Tolerance is not optional, it is mandatory. Network partitions are inevitable, so the practical choice is between CP and AP strategies.
Presentation
PODC 2000
Original presentation by Eric Brewer at the symposium.
History of appearance
The CAP theorem was formulated in the fall
Published in an article "Harvest, Yield, and Scalable Tolerant Systems" at ACM
Presented at the Symposium on Principles of Distributed Computing
Proved formally (consistency turned into linearizability)
Related chapter
DDIA
A detailed analysis of consistency and replication in Kleppmann's book.
Common Misconceptions
The theorem spread to the masses and became a simplified “choose 2 out of 3”, which is a strong simplification for three reasons that Eric Brewer points out:
Rarity of divisions
Due to the rarity of partition, there is no point in choosing between C and A all the time. Most of the time the system can provide both properties.
Solution granularity
The decision about C or A is not made once for the entire system, but at the level of individual operations or data types. Different parts of the system may follow different strategies.
Continuous spectrum
C, A, P are not binary properties, but rather continuous ones. Availability varies from 0 to 100%, consistency levels are also different, and even partitions have nuances.
Partition Mode
In the absence of system separation, we can provide both A and C. During problems, a clear algorithm is needed:
Definition
Detecting the fact of network separation
Transition
Explicit transition to partition mode with limited operations
Recovery
The process of restoring consistency and compensating for errors
Related chapter
Database Internals
Deep dive into transactions and isolation levels.
Communication with ACID and BASE
ACID
⚠️ Consistency in ACID is about data integrity relative to business rules, not about CAP.
BASE
The first two properties help achieve accessibility when dividing the system into parts.
The Role of Latency
Latency is absent in the classical formulation, but is implicitly present. When performing an operation on a partitioned system, we must make a decision:
Cancel operation
Reduce availability in favor of consistency
Continue operation
Accept the risk of data inconsistency
Key insight
From a pragmatic point of view, separation is a time limit (timeout). The stricter the time bounds, the higher the probability of getting into partition mode, even with a slow network without real partitioning.
Key Findings
- There is no global concept of partition - some nodes can detect it, others cannot
- Nodes that discover partition enter partition-mode - the part where you need to choose between C and A
- Designers set time bounds to meet target response speeds
- CAP is not a 2-out-of-3 choice, it's about managing trade-offs during network problems
Original
Telegram: book_cube
A post with an analysis of the formal proof of the CAP theorem.
Formal proof of CAP
In 2002, Seth Gilbert and Nancy Lynch published whitepaper, in which Brewer's conjecture became a formal theorem.
"It is impossible for a web service to provide the following three guarantees: consistency, availability, partition-tolerance"
Formalization of properties
1. Consistency
The authors define consistency as atomic data objects (linearizable consistency):
"Under this consistency guarantee, there must exist a total order on all operations such that each operation looks as if it were completed at a single instant."
The system behaves as if it were running on a single node, processing transactions one at a time.
2. Availability
Every request that reaches a non-failing node should receive a response. The algorithm must terminate (eventually terminate). This weak definition — there is no upper limit on response time. But from the point of view of partition-tolerance this is strong definition — even if there are network failures, each request must complete.
3. Partition-Tolerance
An arbitrary number of messages between nodes can be lost on the network:
"When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost."
Proof
1Theorem for asynchronous systems
Theorem 1:
It is impossible in the asynchronous network model to implement a read/write data object that guarantees Availability and Atomic consistency in all fair executions (including those in which messages are lost).
Proof by contradiction:
- Let's assume that such an algorithm exists
- Let's imagine a system of two nodes: G₁ and G₂
- Let's assume that all messages between G₁ and G₂ are lost (partition)
- Writing to G₁ occurs
- Later reading from G₂ occurs
- Reading from G₂ will not be able to return a value written to G₁
- Contradiction: either consistency or accessibility is violated
2Theorem for partially synchronous systems
Theorem 2:
It is impossible in the partially synchronous network model to implement a read/write data object that guarantees Availability and Atomic consistency in all executions (even those in which messages are lost).
In the real world, partially synchronous systems are used - nodes have clocks to measure time and timeouts. But even in this more powerful setup, the result is similar: the proof follows the same logic.
Note: The rest of the whitepaper is devoted to Delayed-t consistency in partially synchronous systems - a compromise approach that relaxes consistency requirements.
