The database section matters not because it lists technologies, but because it brings the discussion back to the hard part: which data, query, and failure properties force a particular storage choice.
In day-to-day engineering work, this chapter helps split a system into distinct storage roles: transactional core, analytical projections, search layer, cache, and event logs instead of forcing one database to do everything.
For interviews and design reviews, it sets the right frame: workload profile, consistency, latency, and operating cost first, then the name of a specific engine.
Practical value of this chapter
Workload map
Break the product into OLTP/OLAP/streaming profiles and define where strict consistency is required versus where delay is acceptable.
Storage boundaries
Assign data ownership by domain so source-of-truth systems stay separate from indexes, caches, and analytical projections.
Evolution roadmap
Plan migration from a single storage model to polyglot persistence without service interruption and with explicit risk control.
Interview framing
Defend decisions through CAP/PACELC trade-offs, latency budgets, and operating cost, not by naming technologies.
Context
Designing Data-Intensive Applications, 2nd Edition
A core source on data models, consistency, replication and architecture trade-offs in storage systems.
The Storage Systems section helps you treat data architecture as a system-design foundation, not as a late implementation detail. In production, storage decisions define reliability, latency, cost profile and scaling limits for the whole system.
This chapter connects System Design with practical DB choices: which data model to use, where strict consistency is required and how storage should evolve as the product grows.
Why this section matters
Storage choices define system boundaries
Data model and database decisions shape API contracts, consistency guarantees, latency targets, scaling strategy and operations.
Storage trade-offs are core architecture decisions
SQL, document, key-value, wide-column and graph databases solve different problem classes and provide different guarantees.
Data reliability requires explicit engineering
Replication, transactions, recovery and backup strategy must be designed deliberately, not patched in after the first incident.
Wrong database decisions are expensive to reverse
Late migrations of data model and storage strategy usually cost more than early domain-level validation.
Storage competence is mandatory in system design
In interviews and production work, engineers are expected to justify database choices through workload, consistency and total cost of ownership.
How to go through storage systems step by step
Step 1
Define workload profile and critical paths
Start with read/write ratio, data volume, latency requirements and RPO/RTO targets for critical user flows.
Step 2
Choose the right data model for the domain
Map domain entities and query patterns to relational, document, key-value, columnar or graph models.
Step 3
Set consistency and transaction boundaries
Define where strict ACID semantics are mandatory and where eventual consistency with compensating mechanisms is acceptable.
Step 4
Design scaling and availability strategy
Plan replication, partitioning, failover and caching before production traffic reaches scale.
Step 5
Plan storage evolution as a roadmap
Include schema migration policy, data lifecycle rules, cost management and observability in long-term planning.
Key storage trade-offs
Strong consistency vs latency and availability
The stricter the consistency guarantees, the more expensive distributed writes become and the harder it is to keep latency low.
Normalization vs read performance
Normalized schemas improve integrity, while denormalization is often needed for high-throughput read-heavy workloads.
Single DB standard vs specialized storage stack
A single engine reduces operational complexity, while polyglot persistence can fit diverse data access patterns more efficiently.
Managed DB speed vs infrastructure control
Managed services accelerate delivery but can limit low-level tuning, portability and cost control at scale.
What this section covers
Storage foundations
Data models, DB selection principles and foundations for data-intensive system architecture.
Database engines in practice
A practical OLTP/OLAP and NoSQL engine landscape with architectural constraints and operational trade-offs.
How to apply this in practice
Common pitfalls
Recommendations
Section materials
- Introduction to data storage
- Database Selection Framework
- Database Guide (short summary)
- Designing Data-Intensive Applications, 2nd Edition (short summary)
- Database Internals (short summary)
- PostgreSQL Internals (short summary)
- PostgreSQL: architecture and practices
- MySQL: architecture and practices
- MongoDB: document model, replication, and consistency
- Cassandra: architecture and trade-offs
- ClickHouse: analytical DBMS
- Redis: in-memory database and architecture
- YDB: distributed SQL architecture
- CockroachDB: distributed SQL architecture
- DuckDB: embedded OLAP
- Time Series Databases (TSDB): types and trade-offs
Where to go next
Build a storage baseline first
Start with Data Storage Intro, Database Selection Framework and DDIA to build consistent storage reasoning.
Deepen engines and operations
Then continue with PostgreSQL/MySQL/MongoDB/Cassandra/ClickHouse overviews and DB internals for production-level decisions.
Related chapters
- Database Selection Framework - provides a practical decision flow for matching database technology to workload profile and system constraints.
- Designing Data-Intensive Applications, 2nd Edition (short summary) - builds foundational reasoning around data models, replication and transactions in distributed systems.
- PostgreSQL: architecture and practices - grounds OLTP design and relational trade-offs in practical production architecture patterns.
- Cassandra: architecture and trade-offs - extends scaling and consistency decisions for write-heavy distributed workloads.
- ClickHouse: analytical DBMS and architecture - adds the OLAP angle: columnar design, partitioning and high-throughput analytical reads.
