Official website
Jepsen.io
Project for testing distributed systems for correctness.
Jepsen is an independent distributed systems analysis and testing project created by Kyle Kingsbury (aka "Aphyr"). The project identified critical errors in dozens of popular databases and became the de facto standard for testing vendor consistency claims.
Foundation
TCP protocol
Jepsen models network failures and splits at the transport layer.
What is Jepsen?
Testing tool
Jepsen is a Clojure library for testing distributed systems. It generates load, introduces failures (network partitions, process crashes, clock skew) and checks whether the stated guarantees are met.
Series of reports
Each analysis is published as a detailed report: testing settings, detected anomalies, vendor response. The reports have become required reading for distributed system architects.
Related chapter
CAP theorem
Fundamental limitation of distributed systems.
Why is Jepsen important?
Debunking Marketing Claims
Many databases claim "strong consistency" or "ACID" but in practice do not adhere to these guarantees. Jepsen found that MongoDB was losing committed writes, RethinkDB was allowing dirty reads, and Redis Cluster could lose data even without crashing.
Standardization of terminology
The project created a clear hierarchy of consistency models, eliminating confusion between terms from the world of RDBMS (isolation levels) and distributed systems (linearizability).
Improving the quality of systems
After publishing reports, vendors fix bugs. For example, CockroachDB, TiDB, and YugabyteDB have worked extensively with Jepsen to achieve true serializability.
Source
Jepsen: Consistency Models
Interactive diagram of consistency models.
Hierarchy of consistency models
Jepsen has created a complete hierarchy of consistency models, showing how transactional isolation (RDBMS) and linearizability (distributed systems) converge at the top.
Consistency Models Hierarchy
Two parallel branches: Serializable (RDBMS) and Linearizable (distributed systems)
Source: Jepsen.ioUnavailable during network faults. Nodes pause operations to preserve safety guarantees.
Available on healthy nodes if clients keep working with the same servers.
Available on all healthy nodes, even during full network partitions.
Key Insight
Serializable comes from transactional SQL systems (transaction isolation). Linearizable comes from distributed systems (atomic reads/writes). They converge at the top in Strict Serializable, the strictest consistency model.
About Jepsen: Jepsen runs aggressive failure-oriented tests for distributed databases and validates consistency guarantees. Many popular systems (Cassandra, MongoDB, CockroachDB, Redis) have gone through Jepsen analysis.
Related chapter
PACELC theorem
Tradeoffs between latency and consistency.
Two branches of consistency
Serializable (RDBMS)
Comes from the world of relational DBMS. Describes transaction isolation levels: Read Uncommitted → Read Committed → Repeatable Read → Serializable.
Focus:
How transactions interact with each other. What anomalies are allowed (dirty reads, phantom reads, etc.)
Linearizable (Distributed)
Comes from the world of distributed systems. Describes atomicity of operationsread and write on multiple nodes.
Focus:
Does a distributed system look like a single node? Operations are instantaneous and atomic in global time.
Strict Serializable = Linearizable + Serializable
At the top of the hierarchy - Strict Serializable. It is a combination of both models: transactions are performed serializably AND in real time (linearizability). Systems like Google Spanner achieve this through TrueTime.
Key consistency models
Linearizable
Unavailable during partitionEvery transaction appears instantaneous between call and response. All observers see the same sequence of operations. The most stringent model for single operations.
Serializable
Unavailable during partitionTransactions are executed as if sequentially in some order. But this order may not correspond to real time! Highest level of isolation in SQL.
Causal Consistency
Sticky AvailableCausally related operations are visible in the correct order. If A → B (A happened before B), everyone sees them in that order. Achievable in AP systems.
Eventual Consistency
Total AvailableIf there are no new entries, all replicas will eventually converge to the same value. Does not guarantee what you will read at any particular time. The weakest utility model.
Notable Jepsen finds
| System | Statement | Reality | Status |
|---|---|---|---|
| MongoDB | Durable writes | Loss of confirmed entries | Fixed |
| Cassandra | LWT atomicity | Lost and duplicate transactions | Fixed |
| Redis Cluster | Consistency | Data loss without network failures | By design |
| etcd | Linearizable | Confirmed ✓ | Verified |
| CockroachDB | Serializable | Confirmed ✓ | Verified |
| TiDB | Snapshot Isolation | Anomalies found | Fixed |
Full list of reports: jepsen.io/analyses
How Jepsen testing works
Setup
Deploying a cluster on N nodes
Generate
Generation of operations (read, write, CAS)
Nemesis
Introducing faults (partitions, kills, clock skew)
Record
Recording the history of all transactions
Check
Check: Does the story match the model?
Nemesis is a key component. It simulates real failures: it breaks the network between nodes, kills processes, and shifts clocks. If a system claims linearizability, it must withstand all of these scenarios.
Practical conclusions
1. Don't believe the marketing
"Strongly consistent", "ACID", "linearizable" are specific terms with precise definitions. Check Jepsen reports or vendor documentation for specific warranties and known limitations.
2. Understand trade-offs
More stringent consistency models come at a cost: unavailability during failures (CAP) or high latency (PACELC). Choose a model based on application requirements.
3. Test under failure conditions
The correctness of the system is tested not under ideal conditions, but during failures. Use chaos engineering tools (Jepsen, Chaos Monkey, Toxiproxy) to check the behavior of your system.
4. Distinguish isolation from consistency
Serializable isolation (RDBMS) ≠ Linearizable consistency (distributed). The first is about transactions, the second is about individual operations. For complete correctness, both are needed: Strict Serializable.
Resources to Learn
Jepsen is your ally
Before choosing a database for a critical system, check Jepsen reports. If a system is not listed, this does not mean that it is reliable - it means that no one has publicly tested it. No evidence of bugs ≠ evidence of no bugs.
