Jepsen and consistency models — System Design Space

Jepsen matters because it tests what a system actually does under failure, not what the documentation promises. In distributed systems, that is the real moment of truth.

In real work, this chapter helps teams define testable consistency properties for concrete workloads and avoid trusting vendor claims blindly where the cost of being wrong is high.

In interviews and architecture discussions, it is especially useful when you need to show the gap between claimed and actual guarantees under network faults, delay, and coordination loss.

Practical value of this chapter

Design in practice

Promotes guarantee validation before incidents instead of trusting vendor claims.

Decision quality

Shows how to define testable consistency properties for real workloads.

Interview articulation

Strengthens answers with practical linearizability/serializability testing strategies.

Risk and trade-offs

Exposes gaps between claimed and actual guarantees under network failures.

Official website

Jepsen.io

A project that tests distributed systems for correctness under failure.

Перейти на сайт

Jepsen is an independent distributed-systems analysis and testing project created by Kyle Kingsbury, also known as Aphyr. It has uncovered critical correctness bugs in dozens of popular databases and became a practical standard for validating consistency claims.

Foundation

TCP protocol

Jepsen models network failures and partitions at the transport layer.

Читать обзор

What is Jepsen?

Testing tool

Jepsen is a Clojure library that treats the system as a black box: it generates load, injects network partitions, kills processes, shifts clocks, and then checks the operation history against the stated guarantee. No access to internals — only what a client can observe.

ClojureOpen sourceBlack-box testing

Report series

Each analysis ships as a detailed report: test setup, detected anomalies, vendor response, and follow-up fixes. From a report an architect gets not a verdict of “safe/unsafe” but the concrete conditions under which the guarantee breaks.

MongoDBPostgreSQLCassandraetcd+40 others

Related chapter

CAP theorem

A fundamental limitation of distributed systems.

Читать обзор

Why Jepsen matters

Testing marketing claims

The words “strong consistency” and “ACID” on a landing page cost nothing until someone tests them under failure. Jepsen has shown confirmed-write loss in MongoDB, dirty reads in RethinkDB, and data loss in Redis Cluster even without process crashes — right where a user already assumed the data was safe.

A shared language for guarantees

Before Jepsen, engineers argued about “reliability” without agreeing on terms. The project built a hierarchy of consistency models and separated two worlds that used to get conflated: transaction isolation in relational databases and linearizability of distributed operations.

Better systems

A public report with a reproducible anomaly pushes harder than any ticket: it is cheaper for a vendor to fix than to argue. CockroachDB, TiDB, and YugabyteDB worked with Jepsen themselves to substantiate their serializability guarantees rather than settle for a promise.

Source

Jepsen: Consistency Models

Interactive consistency-model hierarchy.

Перейти на сайт

Consistency-model hierarchy

Jepsen collects consistency models into a hierarchy and shows where two traditions meet: transaction isolation in relational databases and linearizability for distributed operations.

Consistency-model hierarchy

Two branches: transaction serializability and linearizability for distributed operations

Source: Jepsen.io

Strict Serializable

Serializable

Linearizable

Repeatable Read

Snapshot Isolation

Sequential

Cursor Stability

Monotonic Atomic View

Causal

Read Committed

PRAM

Read Uncommitted

Writes Follow Reads

Monotonic Reads

Monotonic Writes

Read Your Writes

Transaction isolation

Distributed reads/writes

Unavailable under partition

Unavailable during network faults. Nodes pause operations to preserve safety guarantees.

Sticky available

Available on healthy nodes if clients keep working with the same servers.

Available on healthy nodes

Available on all healthy nodes, even during full network partitions.

Key insight

Serializable comes from transactional SQL systems (transaction isolation). Linearizable comes from distributed systems (atomic reads/writes). They converge at the top in Strict Serializable, the strictest consistency model.

About Jepsen: Jepsen runs failure-oriented tests for distributed databases and validates their stated consistency guarantees. Many popular systems (Cassandra, MongoDB, CockroachDB, Redis) have gone through Jepsen analysis.

Related chapter

PACELC theorem

Trade-offs between latency and consistency.

Читать обзор

Two branches of consistency

Transaction serializability

This branch is rooted in relational databases. It sets transaction isolation levels, from reading uncommitted data all the way up to fully serializable execution.

Focus:

How transactions interact and which anomalies are allowed: dirty reads, phantom reads, and lost updates.

Operation linearizability

The second branch grew up in distributed systems. Its question is the atomicity of individual reads and writes spread across multiple nodes.

Focus:

Whether a distributed system looks like a single node where each operation has a precise place between invocation and response.

Strict serializability = linearizability + serializability

The strongest guarantee lives at the top of the hierarchy — strict serializability. It holds both models at once: transactions execute serializably and respect real-time operation order. The price is steep, so systems only approach it: Google Spanner does so with TrueTime.

Key consistency models

Linearizability

Unavailable during partition

Every operation appears instantaneous between invocation and response. All observers see the same sequence of operations. The strictest model for single operations.

Serializability

Unavailable during partition

Transactions behave as if they executed sequentially in some order, but that order does not have to match real time. The strongest SQL isolation level.

Causal consistency

Sticky available

Causally related operations are observed in the correct order. If event A happened before B, the system should not expose B without its cause. Achievable in AP systems.

Eventual consistency

Available on healthy nodes

If no new writes arrive, all replicas eventually converge. The model does not promise that any particular read observes the latest value. The weakest useful guarantee.

Notable Jepsen findings

System	Claim	Observed behavior	Status
MongoDB	Durable writes	Confirmed writes could be lost	Fixed
Cassandra	Lightweight transaction (LWT) atomicity	Lost and duplicated operations	Fixed
Redis Cluster	Consistency	Data loss without a network fault	By design
etcd	Linearizability	Confirmed ✓	Verified
CockroachDB	Serializability	Confirmed ✓	Verified
TiDB	Snapshot isolation	Anomalies found	Fixed

Full list of reports: jepsen.io/analyses

How Jepsen testing works

Setup

Deploy a cluster on N nodes

Load

Run reads, writes, and CAS operations

Nemesis

Partitions, process kills, and clock shifts

History

Record every call, response, and error

Check

Compare the history with the chosen model

Nemesis is the failure-injection component. It breaks connectivity between nodes, kills processes, and shifts clocks. If a system claims linearizability, it must preserve a valid operation history through those scenarios.

Practical conclusions

1. Do not trust claims without evidence

Strong consistency, ACID, and linearizability are precise technical guarantees, not marketing adjectives. Before you build a system on one, read its Jepsen report and the vendor documentation for the concrete caveats and limitations.

2. Understand the cost of a model

Stricter consistency models have a price: unavailability during network partitions under CAP or higher latency under PACELC. Choose the model from application requirements.

3. Test under failure

Correctness is not established in ideal conditions; it is tested during failures. Use chaos-engineering tools such as Jepsen, Chaos Monkey, and Toxiproxy to observe actual system behavior.

4. Separate isolation from consistency

Serializable isolation in a database is not the same as linearizable consistency in a distributed system. The first is about transactions; the second is about individual operations. Full correctness needs both sides: strict serializability.

What to study next

Jepsen consistency models

Interactive hierarchy with definitions and relationships between guarantees.

Jepsen reports

Analyses of tested systems and vendor responses to observed anomalies.

GitHub: jepsen-io/jepsen

Framework source code for custom distributed-system tests.

DDIA Book

Chapter 9, "Consistency and Consensus", gives a deeper treatment of models and consensus.

How to read the reports soberly

Before choosing a database for a critical system, check Jepsen reports. But a green check in a report is a snapshot of one version under one set of faults, not a permanent guarantee. And a system that is not listed is not a mark of quality, only a sign that no one has publicly broken it. Absence of bug evidence is not evidence of absence.

Related chapters

Why distributed systems and consistency matter - Section context for why consistency guarantees need failure-time validation, not just documentation.
CAP theorem - The baseline availability-versus-consistency choice under network partition that Jepsen exposes in real systems.
PACELC theorem - The CAP extension for normal operation, where latency and consistency shape database behavior before a partition.
Consensus: Paxos and Raft - Mechanisms for strong guarantees through quorums, replicated logs, and leader-oriented protocols.
Leslie Lamport: causality, Paxos, and engineering mindset - Causality and happens-before reasoning needed to understand Jepsen consistency models.
Testing distributed systems - Fault injection and chaos experiments for reproducing distributed-system anomalies.
Designing Data-Intensive Applications, 2nd Edition (short summary) - A deep reference on consistency, replication, and consensus that supports Jepsen-style validation.
Distributed Systems, 4th Edition (short summary) - Theoretical background on failure models and distributed algorithms behind Jepsen reports.
Cassandra: The Definitive Guide (short summary) - A practical example of tunable consistency and fix cycles validated by public Jepsen tests.
MongoDB: document model, replication, and consistency - How replica-set guarantees and write concerns evolved after public Jepsen feedback.