Designing Data-Intensive Applications, 2nd Edition (short summary)

DDIA matters because it turns data systems from scattered terminology into one coherent engineering language about requirements, storage, failures, and growth.

In real work, the second edition helps teams connect data models, storage engines, replication, sharding, transactions, and stream processing as one chain of design choices.

In interviews and architecture discussions, the book is especially useful because it lets you explain growth effects: data redistribution, backpressure, schema evolution, and the price of consistency.

Practical value of this chapter

Design in practice

Systematizes storage, replication, sharding, and event-processing choices in data systems.

Decision quality

Helps choose indexes, storage engines, and replication models by workload and guarantee profile.

Interview articulation

Gives language for reliability, latency, consistency, maintainability, and the cost of future change.

Risk and trade-offs

Highlights partial failures, data redistribution, backpressure, and schema evolution.

Designing Data-Intensive Applications, 2nd Edition

Authors: Martin Kleppmann, Chris Riccomini
Publisher: O'Reilly Media, 2026
Length: 650 pages

A summary of DDIA's second edition: architecture trade-offs, data models, storage, sharding, consistency, streams, derived state, and data privacy.

Original

Translated

Second edition

Official O’Reilly page for Designing Data-Intensive Applications, 2nd Edition: Martin Kleppmann and Chris Riccomini, released in 2026.

Open book page

The second edition of DDIA is best read not as a database catalog, but as a map for data-intensive applications. It connects nonfunctional requirements, reliability, scalability, and maintainability with how storage, replication, transactions, and event processing actually behave.

Reach for it when an architecture review asks you to defend why a particular storage engine fits a workload, where replication boundaries belong, and how sharding changes failure modes. And why consistency cannot be discussed apart from latency, network partitions, and what the system actually shows under load.

How the second edition is organized

The second edition covers more ground. Alongside storage, replication, and consensus, it adds architecture trade-offs, cloud operations, local-first applications, derived state, and responsibility in data systems — and ties them together more tightly than before.

Chapters 1-2

Architecture trade-offs, reliability, scalability, maintainability, and requirements.

Chapters 3-5

Data models, query languages, storage engines, indexes, encoding, and schema evolution.

Chapters 6-10

Replication, sharding, transactions, partial failures, consistency, and consensus.

Chapters 11-13

Batch and stream processing, derived state, data integration, and end-to-end correctness.

Chapter 14

Law, privacy, social consequences, and engineering responsibility for data systems.

Architecture, data models, and storage

Chapters 1-2: trade-offs and requirements

What changes in the second edition:

System design starts with requirements, workload, and expected failures.
Reliability, scalability, and maintainability are treated as observable properties.
Cloud services and managed infrastructure add new trade-offs instead of removing old ones.

What to take away:

Architecture is not a list of technologies; it is a set of choices under constraints.
A good design explains the cost of growth, failure, and future change.
Metrics and observed behavior matter more than elegant abstractions.

Chapter 3: data models and query languages

Relational model

SQL, normalization, relationships, and mature transactional semantics.

Document model

Flexible structure, read locality, and the risk of hidden relationships between documents.

Graph model

Nodes, edges, and queries where relationships matter more than isolated entities.

Chapter 4: storage and retrieval

This chapter explains why the same query can be cheap in one database and expensive in another: the answer is often hidden in the index, log, on-disk format, and read/write profile.

LSM tree

Optimizes write-heavy workloads and sequential flushes to disk.
Requires file merging and careful control of background compaction.
Fits log-oriented and write-heavy systems well.

B-tree

Keeps sorted pages and supports efficient point reads.
Often updates data in place and depends heavily on page cache behavior.
Remains a default index structure in many relational databases.

Chapter 5: encoding and evolution

A data format is a contract between code versions, services, and data that outlives several releases. So the choice of JSON, Avro, or Protocol Buffers settles less than the rules of schema evolution — get those wrong and old code one day fails to read a new record.

JSON/XML

Readable for humans, but compatibility still needs discipline.

Thrift/Protocol Buffers

Compact and fast, but more tied to field IDs.

Avro

Works well when schemas evolve together with stored data.

Distributed data

Chapter 6: replication

Single leader

All writes go through the leader replica.
The model is easier to reason about and debug.
The hard part is failover when the leader is unavailable.

Multi-leader

Writes are accepted in multiple regions.
Useful for geographically distributed clients.
The main cost is conflict detection and resolution.

Leaderless

Clients talk to multiple replicas directly.
Read and write quorums make guarantees tunable.
The system must repair divergence between copies.

This part of the second edition is also where local-first applications, CRDTs, and multi-device synchronization become especially relevant — the case where copies must converge with no single leader.

Chapter 7: sharding

Partitioning strategies:

Hashing the key spreads load evenly but loses ordering.
Range partitioning suits time, geography, and naturally ordered data.
Consistent hashing reduces data movement when the cluster changes.

Operational risks:

Hot keys and uneven workload distribution.
Queries that must fan out across all shards.
Rebalancing work that competes with user traffic.

Chapter 8: transactions

DDIA is valuable here because it explains isolation levels through real anomalies rather than through dry database documentation.

Level	What it gives	Where caution remains
Read Committed	Does not expose uncommitted writes.	Does not prevent every read race.
Snapshot Isolation	Provides a consistent read snapshot.	Can still allow write skew.
Serializable	Moves behavior closer to sequential execution.	Costs more in latency and conflicts.

Chapters 9-10: failures, consistency, and consensus

Why distribution is hard:

Network partitions break the simple idea that a node is either alive or dead.
Partial failures create different views of reality across participants.
Clock skew makes event ordering less obvious than it looks.

Where consensus appears:

Paxos and Raft help choose a single order of writes.
Quorums define when a decision can be considered accepted.
The price of consensus is latency, recovery complexity, and dependence on a majority.

Processing, derived state, and responsibility

Chapter 11: batch processing

Unix pipeline idea:

Large-scale processing becomes easier to reason about when each step reads an input stream, writes an output stream, and remains reusable.

cat log.txt | grep ERROR | sort | uniq -c

Platform evolution:

MapReduce makes processing distributed, but pays with extra I/O.
Spark speeds up iterative jobs through memory and DAG execution.
Flink brings batch and stream scenarios closer together.

Chapter 12: stream processing

Stream processing is usually filed under real-time features, but its larger point is wider: every data change becomes an explicit entry in an event log instead of a silent row update.

Message brokers

Kafka, RabbitMQ, and Pulsar as different delivery and storage models.

Event sourcing

An immutable event log as the source of system state.

Change data capture

Moving database changes into analytics and search systems.

Chapters 13-14: derived state, law, and society

Derived state

The second edition connects materialized views, GraphQL, workflows, and data integration into one theme: how to derive new state safely from recorded facts.

Data privacy

The final chapter moves beyond performance: data systems must account for law, user consent, explainability, and the consequences of automation.

How to use DDIA in system design

DDIA does not give canned interview answers. Its strength is that it teaches you to explain designs through workload, guarantees, failures, and the cost of future change.

Database choice

Compare SQL, NoSQL, indexes, and storage engines by read, write, and schema-change profile.

Replication and regions

Explain where synchrony is required, where lag is acceptable, and how the system recovers.

Sharding

Choose partition keys, identify hot keys early, and plan data redistribution.

Transactions and isolation

Name the isolation level and explain which anomalies it still leaves possible.

Events and streams

Build processing around idempotency, deduplication, and an explicit change log.

Responsibility for data

Account for privacy, retention, explainability, and the risk of incorrect automated decisions.

Verdict

Strengths

Frames system design as a set of testable trade-offs.
Connects data models, storage, replication, transactions, and streams into one picture.
Updates the focus for modern cloud, local-first, and event-driven systems.
Adds law, privacy, and social consequences to the data-systems conversation.
Gives engineers a language for mature architecture discussions, not a set of templates.

Caveats

It is dense; reading by topic works better than trying to rush through it.
It explains why choices work, but does not replace practice designing concrete systems.
Many chapters become most useful when tied back to examples from your own work.
For interviews, pair it with case studies and workload-estimation practice.

Recommendation:

DDIA remains essential reading for engineers who design data systems. The second edition is especially useful as a bridge between classic distributed systems and modern products where data lives in the cloud, on devices, in event streams, and under regulatory constraints.

Sources

Related chapters

Why are distributed systems and consistency needed? - A section map for turning DDIA ideas into concrete architecture decisions.
CAP theorem - The foundational consistency-availability choice when a network partition occurs.
PACELC theorem - An extension of partition-mode reasoning for normal operation: how latency changes data guarantees.
Consensus: Paxos and Raft - A practical continuation of DDIA topics around quorums, replicated logs, and leader election.
Jepsen and consistency models - How to test database guarantees against real failures and anomaly patterns.
Testing Distributed Systems - How to turn DDIA-style reasoning into verifiable failure and recovery scenarios.
Replication and sharding: growth strategies - The operational layer for data copies, shards, and load redistribution.
Consistency and idempotency - Implementation patterns that preserve correctness under retries and failures.
Why understanding storage systems matters - A map of databases and storage engines that complements DDIA's system-level reasoning.
Database Internals: A Deep Dive (short summary) - A lower-level view of indexes, logs, and storage structures as a continuation of DDIA.
Distributed Systems, 4th Edition (short summary) - Distributed-systems theory alongside DDIA's engineering perspective.

Where to find the book

Original

oreilly.com

Designing Data-Intensive Applications, 2nd Edition

Translated

piter.com

Высоконагруженные приложения. Программирование, масштабирование, поддержка. 2-е издание