A book on database internals is not about academic prestige. It matters because without that layer, teams make architectural decisions from surface signals and vendor labels.
In engineering practice, it helps you see how B-Trees, LSMs, transactions, replication, and consensus change the real write path, read amplification, recovery behavior, and concurrency model of a system.
In interviews and architecture discussions, this material is especially valuable as a differentiator because it lets you explain not only what to choose, but why a mechanism behaves the way it does.
Practical value of this chapter
Storage-engine literacy
Deep B-Tree/LSM understanding improves architectural choices for read/write paths and workload behavior.
Isolation intuition
Internal transaction mechanics make isolation-level and concurrency decisions explicit and defensible.
Replication and consensus
Tie replication models directly to availability targets, read freshness, and recovery requirements.
Interview deep dive
Use internals-level explanation as a differentiator: explain not only what to choose, but why it works.
Related chapter
PostgreSQL from the inside
Deep dive into MVCC, WAL, locks and PostgreSQL indexes from Egor Rogov.
Database Internals
Authors: Alex Petrov
Publisher: O'Reilly Media, Inc.
Length: 370 pages
Analysis of the book by Alex Petrov: B-Trees, LSM-Trees, transactions, replication, consensus and the internal structure of the DBMS.
Detailed analysis
Code of Architecture
Detailed analysis of the first part from Alexander and the Code of Architecture club
Part I: Storage Engines
B-Trees and their variants
Data structures
Disk optimizations
Key insight: B-Trees are optimized for reads and in-place updates, making them ideal for OLTP workloads. PostgreSQL and MySQL InnoDB use B+ Tree for indexes.
Detailed analysis
Code of Architecture
Detailed analysis of the chapter on LSM-Tree from Alexander and the Code of Architecture club
LSM-Trees (Log-Structured Merge Trees)
Components
Compaction
Reading optimizations
Key insight: LSM-Trees are optimized for writing (sequential I/O), but require compaction to maintain read performance. Used in Cassandra, RocksDB, LevelDB, HBase.
B-Tree vs LSM-Tree: choosing a data structure
B-Tree architecture
✓ Advantages
- Fast reads: O(log N)
- Efficient range queries
- In-place updates
✗ Drawbacks
- Write amplification
- Random I/O on writes
Used in:
Transaction Processing
Competition management
Recovery
Part II: Distributed Systems
Detailed analysis
Code of Architecture
Detailed analysis of the chapter on replication and partitioning from Alexander and the Code of Architecture club
Replication & Partitioning
Replication
Partitioning
Detailed analysis
Code of Architecture
Detailed analysis of the chapter on consensus protocols from Alexander and the Code of Architecture club
Consensus Protocols
Paxos
- Classical Lamport algorithm
- Prepare → Promise → Accept
- Difficult to implement
- Multi-Paxos for the leader
Raft
- Understandable consensus
- Leader election + Log replication
- etcd, Consul, CockroachDB
- Easier to implement
Zab
- Zookeeper Atomic Broadcast
- Primary-backup model
- FIFO ordering guarantees
- Optimized for writes
Distributed Transactions
Atomic Commit Protocols
Alternative approaches
Low Level Details
File formats
Disk I/O optimization
Examples from real DBMSs
PostgreSQL
MySQL InnoDB
RocksDB
Cassandra
MongoDB
CockroachDB
Results and recommendations
Strengths
- Deep analysis of the internal structure of databases
- Comparison of B-Tree vs LSM-Tree with trade-offs
- Detailed analysis of consensus algorithms
- Examples from real production systems
- Physical disk storage explained
Who is it suitable for?
- Database engineers
- For storage system developers
- For those who want to understand trade-offs of different DBMSs
- Preparation for Staff+ positions in DB companies
- Storage Researchers
Verdict: Database Internals is a unique book that bridges the gap between high-level systems design books and academic works. If DDIA explains What do, then Petrov explains How this is implemented internally. A must for anyone who wants to understand databases at a deep level.
Related chapters
- PostgreSQL from the inside (short summary) - Comparison of internals perspectives: MVCC, WAL, and index structures in PostgreSQL against the broader patterns from Petrov.
- Designing Data-Intensive Applications (short summary) - Bridge between DDIA system-level theory and low-level implementation details of storage engines, replication, and consensus.
- Why understand storage systems? - Landscape chapter showing where storage internals knowledge directly improves architecture decisions.
- Database Selection Framework - Selection framework grounded in understanding B-Tree/LSM behavior, durability mechanics, and operational trade-offs.
- Replication and sharding - Operational continuation of the book topics: read/write paths, fault tolerance, rebalancing, and data-scale growth.
