Jepsen matters because it tests what a system actually does under failure, not what the documentation promises. In distributed systems, that is the real moment of truth.
In real work, this chapter helps teams define testable consistency properties for concrete workloads and stop trusting vendor claims blindly where the cost of being wrong is high.
In interviews and architecture discussions, it is especially strong when you need to show the gap between claimed and actual guarantees under network faults, delay, and coordination loss.
Practical value of this chapter
Design in practice
Promotes guarantee validation before incidents instead of trusting vendor claims.
Decision quality
Shows how to define testable consistency properties for real workloads.
Interview articulation
Strengthens answers with practical linearizability/serializability testing strategies.
Risk and trade-offs
Exposes gaps between claimed and actual guarantees under network failures.
Official website
Jepsen.io
Project for testing distributed systems for correctness.
Jepsen is an independent distributed systems analysis and testing project created by Kyle Kingsbury (aka "Aphyr"). The project identified critical errors in dozens of popular databases and became the de facto standard for testing vendor consistency claims.
Foundation
TCP protocol
Jepsen models network failures and splits at the transport layer.
What is Jepsen?
Testing tool
Jepsen is a Clojure library for testing distributed systems. It generates load, introduces failures (network partitions, process crashes, clock skew) and checks whether the stated guarantees are met.
Series of reports
Each analysis is published as a detailed report: testing settings, detected anomalies, vendor response. The reports have become required reading for distributed system architects.
Related chapter
CAP theorem
Fundamental limitation of distributed systems.
Why is Jepsen important?
Debunking Marketing Claims
Many databases claim "strong consistency" or "ACID" but in practice do not adhere to these guarantees. Jepsen found that MongoDB was losing committed writes, RethinkDB was allowing dirty reads, and Redis Cluster could lose data even without crashing.
Standardization of terminology
The project created a clear hierarchy of consistency models, eliminating confusion between terms from the world of RDBMS (isolation levels) and distributed systems (linearizability).
Improving the quality of systems
After publishing reports, vendors fix bugs. For example, CockroachDB, TiDB, and YugabyteDB have worked extensively with Jepsen to achieve true serializability.
Source
Jepsen: Consistency Models
Interactive diagram of consistency models.
Hierarchy of consistency models
Jepsen has created a complete hierarchy of consistency models, showing how transactional isolation (RDBMS) and linearizability (distributed systems) converge at the top.
Consistency Models Hierarchy
Two parallel branches: Serializable (RDBMS) and Linearizable (distributed systems)
Source: Jepsen.ioUnavailable during network faults. Nodes pause operations to preserve safety guarantees.
Available on healthy nodes if clients keep working with the same servers.
Available on all healthy nodes, even during full network partitions.
Key Insight
Serializable comes from transactional SQL systems (transaction isolation). Linearizable comes from distributed systems (atomic reads/writes). They converge at the top in Strict Serializable, the strictest consistency model.
About Jepsen: Jepsen runs aggressive failure-oriented tests for distributed databases and validates consistency guarantees. Many popular systems (Cassandra, MongoDB, CockroachDB, Redis) have gone through Jepsen analysis.
Related chapter
PACELC theorem
Tradeoffs between latency and consistency.
Two branches of consistency
Serializable (RDBMS)
Comes from the world of relational DBMS. Describes transaction isolation levels: Read Uncommitted → Read Committed → Repeatable Read → Serializable.
Focus:
How transactions interact with each other. What anomalies are allowed (dirty reads, phantom reads, etc.)
Linearizable (Distributed)
Comes from the world of distributed systems. Describes atomicity of operationsread and write on multiple nodes.
Focus:
Does a distributed system look like a single node? Operations are instantaneous and atomic in global time.
Strict Serializable = Linearizable + Serializable
At the top of the hierarchy - Strict Serializable. It is a combination of both models: transactions are performed serializably AND in real time (linearizability). Systems like Google Spanner achieve this through TrueTime.
Key consistency models
Linearizable
Unavailable during partitionEvery transaction appears instantaneous between call and response. All observers see the same sequence of operations. The most stringent model for single operations.
Serializable
Unavailable during partitionTransactions are executed as if sequentially in some order. But this order may not correspond to real time! Highest level of isolation in SQL.
Causal Consistency
Sticky AvailableCausally related operations are visible in the correct order. If A → B (A happened before B), everyone sees them in that order. Achievable in AP systems.
Eventual Consistency
Total AvailableIf there are no new entries, all replicas will eventually converge to the same value. Does not guarantee what you will read at any particular time. The weakest utility model.
Notable Jepsen finds
| System | Statement | Reality | Status |
|---|---|---|---|
| MongoDB | Durable writes | Loss of confirmed entries | Fixed |
| Cassandra | LWT atomicity | Lost and duplicate transactions | Fixed |
| Redis Cluster | Consistency | Data loss without network failures | By design |
| etcd | Linearizable | Confirmed ✓ | Verified |
| CockroachDB | Serializable | Confirmed ✓ | Verified |
| TiDB | Snapshot Isolation | Anomalies found | Fixed |
Full list of reports: jepsen.io/analyses
How Jepsen testing works
Setup
Deploying a cluster on N nodes
Generate
Generation of operations (read, write, CAS)
Nemesis
Introducing faults (partitions, kills, clock skew)
Record
Recording the history of all transactions
Check
Check: Does the story match the model?
Nemesis is a key component. It simulates real failures: it breaks the network between nodes, kills processes, and shifts clocks. If a system claims linearizability, it must withstand all of these scenarios.
Practical conclusions
1. Don't believe the marketing
"Strongly consistent", "ACID", "linearizable" are specific terms with precise definitions. Check Jepsen reports or vendor documentation for specific warranties and known limitations.
2. Understand trade-offs
More stringent consistency models come at a cost: unavailability during failures (CAP) or high latency (PACELC). Choose a model based on application requirements.
3. Test under failure conditions
The correctness of the system is tested not under ideal conditions, but during failures. Use chaos engineering tools (Jepsen, Chaos Monkey, Toxiproxy) to check the behavior of your system.
4. Distinguish isolation from consistency
Serializable isolation (RDBMS) ≠ Linearizable consistency (distributed). The first is about transactions, the second is about individual operations. For complete correctness, both are needed: Strict Serializable.
Resources to Learn
Jepsen is your ally
Before choosing a database for a critical system, check Jepsen reports. If a system is not listed, this does not mean that it is reliable - it means that no one has publicly tested it. No evidence of bugs ≠ evidence of no bugs.
Related chapters
- Why distributed systems and consistency matter - Section context for why consistency guarantees must be validated under failures, not only documented.
- CAP theorem - Foundational availability/consistency trade-off under partition that Jepsen validates in real systems.
- PACELC theorem - CAP extension for normal operation where latency/consistency trade-offs shape production behavior.
- Consensus: Paxos and Raft - Core mechanisms for strong consistency through quorum writes, replicated logs, and leader-based protocols.
- Leslie Lamport: causality, Paxos, and engineering mindset - Conceptual foundation of happens-before and causality required to reason about Jepsen consistency models.
- Testing distributed systems - Fault-injection and chaos-testing practices to reproduce and diagnose distributed-system anomalies.
- Designing Data-Intensive Applications (short summary) - Deep reference on consistency, replication, and consensus that underpins Jepsen-style validation.
- Distributed Systems: Principles and Paradigms (short summary) - Theoretical baseline on failure models and distributed algorithms behind Jepsen findings.
- Cassandra: The Definitive Guide (short summary) - Practical case of tunable consistency and anomaly/fix cycles validated by Jepsen reports.
- MongoDB: architecture and usage scenarios - How replica set guarantees and write concerns evolved with public Jepsen feedback.
