The database section matters not because it lists technologies, but because it brings the discussion back to the hard part: which data and query properties force a particular storage choice.
In day-to-day engineering work, this chapter helps split a system into distinct storage roles: transactional core, analytical projections, search layer, cache, and event logs, instead of forcing one database to do everything.
For interviews and design reviews, it sets the right frame: workload profile, consistency, latency, and operating cost first, and only then the name of a specific engine.
Practical value of this chapter
Workload map
Break the product into OLTP/OLAP/streaming profiles and define where strict consistency is required versus where delay is acceptable.
Storage boundaries
Assign data ownership by domain so source-of-truth systems stay separate from indexes, caches, and analytical projections.
Evolution roadmap
Plan migration from a single storage model to polyglot persistence without service interruption and with explicit risk control.
Interview framing
Defend decisions through CAP/PACELC trade-offs, latency budgets, and operating cost, not by naming technologies.
Context
Designing Data-Intensive Applications
A core source on data models, consistency, replication and architecture trade-offs in storage systems.
The Storage Systems section helps you treat data architecture as a system-design foundation, not as a late implementation detail. In production, storage decisions directly define reliability, latency, cost profile and scaling limits.
This chapter connects System Design with practical DB choices: which data model to use, where strict consistency is required and how storage evolves with product growth.
Why this section matters
Storage choices shape system boundaries
Data model and database decisions affect API contracts, consistency semantics, latency profile and scaling strategy.
Storage trade-offs are core architecture decisions
SQL, document, key-value, wide-column and graph databases solve different problem classes with different guarantees.
Data reliability requires explicit engineering
Replication, transactions, backup and recovery design must be treated as first-class architecture concerns.
Wrong database decisions are expensive to reverse
Late migrations of data model and storage strategy usually cost more than early domain-level validation.
Storage competence is mandatory in system design
In interviews and production work, engineers are expected to justify DB decisions through workload, consistency and ownership cost.
How to go through storage systems step by step
Step 1
Define workload profile and critical paths
Start with read/write ratio, data growth, latency targets and RTO/RPO requirements for critical product flows.
Step 2
Choose the right data model for the domain
Map domain entities and query patterns to relational, document, key-value, columnar or graph models.
Step 3
Set consistency and transaction boundaries
Define where strict ACID is mandatory and where eventual consistency with compensating mechanisms is acceptable.
Step 4
Design scaling and availability strategy
Plan replication, partitioning, failover and caching before production-scale traffic arrives.
Step 5
Treat storage maturity as a roadmap
Include schema migration policy, data lifecycle management, cost control and observability in long-term planning.
Key storage trade-offs
Strong consistency vs latency and availability
The stricter the consistency guarantees, the higher the distributed-write cost and the harder low-latency operation becomes.
Normalization vs read performance
Normalized schemas improve integrity, while denormalization is often needed for high-throughput read-heavy workloads.
Single DB standard vs specialized storage stack
A single engine reduces operations overhead, but polyglot persistence can better fit diverse data access patterns.
Managed DB speed vs infrastructure control
Managed services accelerate delivery but can limit low-level tuning, portability and cost optimization at scale.
What this section covers
Storage foundations
Data models, DB selection principles and core foundations for data-intensive system architecture.
Database engines in practice
Practical OLTP/OLAP and NoSQL engine landscape with architectural constraints and operational trade-offs.
How to apply this in practice
Common pitfalls
Recommendations
Section materials
- Introduction to data storage
- Database Selection Framework
- Database Guide (short summary)
- Designing Data-Intensive Applications (short summary)
- Database Internals (short summary)
- PostgreSQL Internals (short summary)
- PostgreSQL: architecture and practices
- MySQL: architecture and practices
- MongoDB: architecture and practices
- Cassandra: architecture and practices
- ClickHouse: analytical DBMS
- Redis: in-memory architecture
- YDB: distributed SQL architecture
- CockroachDB: distributed SQL architecture
- DuckDB: embedded OLAP
- Time Series Databases (TSDB): types and trade-offs
Where to go next
Build a storage baseline first
Start with Data Storage Intro, Database Selection Framework and DDIA to build consistent storage reasoning.
Deepen engines and operations
Then continue with PostgreSQL/MySQL/MongoDB/Cassandra/ClickHouse overviews and DB internals for production-level decisions.
Related chapters
- Database Selection Framework - it provides a practical decision flow for matching DB technology to workload profile and system constraints.
- Designing Data-Intensive Applications (short summary) - it builds foundational reasoning around data models, replication and transactions in distributed systems.
- PostgreSQL: architecture and practices - it grounds OLTP design and relational trade-offs with practical production architecture patterns.
- Cassandra: architecture and practices - it extends scaling and consistency decisions for high-write distributed workloads.
- ClickHouse: analytical DBMS and architecture - it adds the OLAP angle: columnar design, partitioning and high-throughput analytical reads.
