DDIA matters because it turns data systems from scattered terminology into one coherent engineering language about requirements, storage, failures, and growth.
In real work, the second edition helps teams connect data models, storage engines, replication, sharding, transactions, and stream processing as one chain of design choices.
In interviews and architecture discussions, the book is especially useful because it lets you explain growth effects: data redistribution, backpressure, schema evolution, and the price of consistency.
Practical value of this chapter
Design in practice
Systematizes storage, replication, sharding, and event-processing choices in data systems.
Decision quality
Helps choose indexes, storage engines, and replication models by workload and guarantee profile.
Interview articulation
Gives language for reliability, latency, consistency, maintainability, and the cost of future change.
Risk and trade-offs
Highlights partial failures, data redistribution, backpressure, and schema evolution.
Designing Data-Intensive Applications, 2nd Edition
Authors: Martin Kleppmann, Chris Riccomini
Publisher: O'Reilly Media, 2026
Length: 650 pages
A summary of DDIA's second edition: architecture trade-offs, data models, storage, sharding, consistency, streams, derived state, and data privacy.
Second edition
Official O’Reilly page for Designing Data-Intensive Applications, 2nd Edition: Martin Kleppmann and Chris Riccomini, released in 2026.
The second edition of DDIA is best read not as a database catalog, but as a map for data-intensive applications. It connects nonfunctional requirements, reliability, scalability, and maintainability with how storage, replication, transactions, and event processing actually behave.
The book is most useful when you need to explain why a particular storage engine fits a workload, where replication boundaries belong, how sharding changes failure modes, and why consistency cannot be discussed separately from latency, network partitions, and observed system behavior.
How the second edition is organized
The second edition expands the original arc: alongside storage, replication, and consensus, it connects architecture trade-offs, cloud operations, local-first applications, derived state, and responsibility in data systems.
Chapters 1-2
Architecture trade-offs, reliability, scalability, maintainability, and requirements.
Chapters 3-5
Data models, query languages, storage engines, indexes, encoding, and schema evolution.
Chapters 6-10
Replication, sharding, transactions, partial failures, consistency, and consensus.
Chapters 11-13
Batch and stream processing, derived state, data integration, and end-to-end correctness.
Chapter 14
Law, privacy, social consequences, and engineering responsibility for data systems.
Architecture, data models, and storage
Chapters 1-2: trade-offs and requirements
What changes in the second edition:
- System design starts with requirements, workload, and expected failures.
- Reliability, scalability, and maintainability are treated as observable properties.
- Cloud services and managed infrastructure add new trade-offs instead of removing old ones.
What to take away:
- Architecture is not a list of technologies; it is a set of choices under constraints.
- A good design explains the cost of growth, failure, and future change.
- Metrics and observed behavior matter more than elegant abstractions.
Chapter 3: data models and query languages
Relational model
SQL, normalization, relationships, and mature transactional semantics.
Document model
Flexible structure, read locality, and the risk of hidden relationships between documents.
Graph model
Nodes, edges, and queries where relationships matter more than isolated entities.
Chapter 4: storage and retrieval
This chapter explains why the same query can be cheap in one database and expensive in another: the answer is often hidden in the index, log, on-disk format, and read/write profile.
LSM tree
- Optimizes write-heavy workloads and sequential flushes to disk.
- Requires file merging and careful control of background compaction.
- Fits log-oriented and write-heavy systems well.
B-tree
- Keeps sorted pages and supports efficient point reads.
- Often updates data in place and depends heavily on page cache behavior.
- Remains a default index structure in many relational databases.
Chapter 5: encoding and evolution
Data formats are contracts between code versions, services, and long-lived data. That makes JSON, Avro, Protocol Buffers, and schema evolution part of the same design conversation.
JSON/XML
Readable for humans, but compatibility still needs discipline.
Thrift/Protocol Buffers
Compact and fast, but more tied to field IDs.
Avro
Works well when schemas evolve together with stored data.
Distributed data
Chapter 6: replication
Single leader
- All writes go through the leader replica.
- The model is easier to reason about and debug.
- The hard part is failover when the leader is unavailable.
Multi-leader
- Writes are accepted in multiple regions.
- Useful for geographically distributed clients.
- The main cost is conflict detection and resolution.
Leaderless
- Clients talk to multiple replicas directly.
- Read and write quorums make guarantees tunable.
- The system must repair divergence between copies.
This part of the second edition is also where local-first applications, CRDTs, and multi-device synchronization become especially relevant.
Chapter 7: sharding
Partitioning strategies:
- By key, for even distribution through hashing.
- By range, for time, geography, and naturally ordered data.
- Consistent hashing reduces data movement when the cluster changes.
Operational risks:
- Hot keys and uneven workload distribution.
- Queries that must fan out across all shards.
- Rebalancing work that competes with user traffic.
Chapter 8: transactions
DDIA is valuable here because it explains isolation levels through real anomalies rather than through dry database documentation.
| Level | What it gives | Where caution remains |
|---|---|---|
| Read Committed | Does not expose uncommitted writes. | Does not prevent every read race. |
| Snapshot Isolation | Provides a consistent read snapshot. | Can still allow write skew. |
| Serializable | Moves behavior closer to sequential execution. | Costs more in latency and conflicts. |
Chapters 9-10: failures, consistency, and consensus
Why distribution is hard:
- Network partitions break the simple idea that a node is either alive or dead.
- Partial failures create different views of reality across participants.
- Clock skew makes event ordering less obvious than it looks.
Where consensus appears:
- Paxos and Raft help choose a single order of writes.
- Quorums define when a decision can be considered accepted.
- The price of consensus is latency, recovery complexity, and dependence on a majority.
Processing, derived state, and responsibility
Chapter 11: batch processing
Unix pipeline idea:
Large-scale processing becomes easier to reason about when each step reads an input stream, writes an output stream, and remains reusable.
cat log.txt | grep ERROR | sort | uniq -cPlatform evolution:
- MapReduce makes processing distributed, but pays with extra I/O.
- Spark speeds up iterative jobs through memory and DAG execution.
- Flink brings batch and stream scenarios closer together.
Chapter 12: stream processing
Stream processing matters not only for real-time features. It lets systems treat data changes as an explicit event log.
Message brokers
Kafka, RabbitMQ, and Pulsar as different delivery and storage models.
Event sourcing
An immutable event log as the source of system state.
Change data capture
Moving database changes into analytics and search systems.
Chapters 13-14: derived state, law, and society
Derived state
The second edition connects materialized views, GraphQL, workflows, and data integration into one theme: how to derive new state safely from recorded facts.
Data privacy
The final chapter moves beyond performance: data systems must account for law, user consent, explainability, and the consequences of automation.
How to use DDIA in system design
DDIA does not give canned interview answers. Its strength is that it teaches you to explain designs through workload, guarantees, failures, and the cost of future change.
Database choice
Compare SQL, NoSQL, indexes, and storage engines by read, write, and schema-change profile.
Replication and regions
Explain where synchrony is required, where lag is acceptable, and how the system recovers.
Sharding
Choose partition keys, identify hot keys early, and plan data redistribution.
Transactions and isolation
Name the isolation level and explain which anomalies it still leaves possible.
Events and streams
Build processing around idempotency, deduplication, and an explicit change log.
Responsibility for data
Account for privacy, retention, explainability, and the risk of incorrect automated decisions.
Verdict
Strengths
- Frames system design as a set of testable trade-offs.
- Connects data models, storage, replication, transactions, and streams into one picture.
- Updates the focus for modern cloud, local-first, and event-driven systems.
- Adds law, privacy, and social consequences to the data-systems conversation.
- Gives engineers a language for mature architecture discussions, not a set of templates.
Caveats
- It is dense; reading by topic works better than trying to rush through it.
- It explains why choices work, but does not replace practice designing concrete systems.
- Many chapters become most useful when tied back to examples from your own work.
- For interviews, pair it with case studies and workload-estimation practice.
Recommendation:
DDIA remains essential reading for engineers who design data systems. The second edition is especially useful as a bridge between classic distributed systems and modern products where data lives in the cloud, on devices, in event streams, and under regulatory constraints.
Sources
Related chapters
- Why are distributed systems and consistency needed? - A section map for turning DDIA ideas into concrete architecture decisions.
- CAP theorem - The foundational consistency-availability choice when a network partition occurs.
- PACELC theorem - An extension of partition-mode reasoning for normal operation: how latency changes data guarantees.
- Consensus: Paxos and Raft - A practical continuation of DDIA topics around quorums, replicated logs, and leader election.
- Jepsen and consistency models - How to test database guarantees against real failures and anomaly patterns.
- Testing Distributed Systems - How to turn DDIA-style reasoning into verifiable failure and recovery scenarios.
- Replication and sharding: growth strategies - The operational layer for data copies, shards, and load redistribution.
- Consistency and idempotency - Implementation patterns that preserve correctness under retries and failures.
- Why understanding storage systems matters - A map of databases and storage engines that complements DDIA's system-level reasoning.
- Database Internals: A Deep Dive (short summary) - A lower-level view of indexes, logs, and storage structures as a continuation of DDIA.
- Distributed Systems, 4th Edition (short summary) - Distributed-systems theory alongside DDIA's engineering perspective.
