System Design Space
Knowledge graphSettings

Updated: May 4, 2026 at 8:57 PM

Why understand storage systems?

easy

Introductory chapter on data models, workload profile, consistency, replication, recovery, ownership cost, and choosing the right storage class.

The database section matters not because it lists technologies, but because it brings the discussion back to the hard part: which data, query, and failure properties force a particular storage choice.

In day-to-day engineering work, this chapter helps split a system into distinct storage roles: transactional core, analytical projections, search layer, cache, and event logs instead of forcing one database to do everything.

For interviews and design reviews, it sets the right frame: workload profile, consistency, latency, and operating cost first, then the name of a specific engine.

Practical value of this chapter

Workload map

Break the product into OLTP/OLAP/streaming profiles and define where strict consistency is required versus where delay is acceptable.

Storage boundaries

Assign data ownership by domain so source-of-truth systems stay separate from indexes, caches, and analytical projections.

Evolution roadmap

Plan migration from a single storage model to polyglot persistence without service interruption and with explicit risk control.

Interview framing

Defend decisions through CAP/PACELC trade-offs, latency budgets, and operating cost, not by naming technologies.

Context

Designing Data-Intensive Applications, 2nd Edition

A core source on data models, consistency, replication and architecture trade-offs in storage systems.

Читать обзор

The Storage Systems section helps you treat data architecture as a system-design foundation, not as a late implementation detail. In production, storage decisions define reliability, latency, cost profile and scaling limits for the whole system.

This chapter connects System Design with practical DB choices: which data model to use, where strict consistency is required and how storage should evolve as the product grows.

Why this section matters

Storage choices define system boundaries

Data model and database decisions shape API contracts, consistency guarantees, latency targets, scaling strategy and operations.

Storage trade-offs are core architecture decisions

SQL, document, key-value, wide-column and graph databases solve different problem classes and provide different guarantees.

Data reliability requires explicit engineering

Replication, transactions, recovery and backup strategy must be designed deliberately, not patched in after the first incident.

Wrong database decisions are expensive to reverse

Late migrations of data model and storage strategy usually cost more than early domain-level validation.

Storage competence is mandatory in system design

In interviews and production work, engineers are expected to justify database choices through workload, consistency and total cost of ownership.

How to go through storage systems step by step

Step 1

Define workload profile and critical paths

Start with read/write ratio, data volume, latency requirements and RPO/RTO targets for critical user flows.

Step 2

Choose the right data model for the domain

Map domain entities and query patterns to relational, document, key-value, columnar or graph models.

Step 3

Set consistency and transaction boundaries

Define where strict ACID semantics are mandatory and where eventual consistency with compensating mechanisms is acceptable.

Step 4

Design scaling and availability strategy

Plan replication, partitioning, failover and caching before production traffic reaches scale.

Step 5

Plan storage evolution as a roadmap

Include schema migration policy, data lifecycle rules, cost management and observability in long-term planning.

Key storage trade-offs

Strong consistency vs latency and availability

The stricter the consistency guarantees, the more expensive distributed writes become and the harder it is to keep latency low.

Normalization vs read performance

Normalized schemas improve integrity, while denormalization is often needed for high-throughput read-heavy workloads.

Single DB standard vs specialized storage stack

A single engine reduces operational complexity, while polyglot persistence can fit diverse data access patterns more efficiently.

Managed DB speed vs infrastructure control

Managed services accelerate delivery but can limit low-level tuning, portability and cost control at scale.

What this section covers

Storage foundations

Data models, DB selection principles and foundations for data-intensive system architecture.

Database engines in practice

A practical OLTP/OLAP and NoSQL engine landscape with architectural constraints and operational trade-offs.

How to apply this in practice

Common pitfalls

Choosing a database by popularity instead of workload and consistency requirements.
Deferring replication, backup and recovery strategy until after incidents happen.
Adopting polyglot persistence without operational readiness and ownership boundaries.
Ignoring schema evolution and migration compatibility while product complexity grows.

Recommendations

Start DB selection from explicit latency, consistency, RPO/RTO and expected growth constraints.
Validate the data model against real query patterns and failure scenarios before production rollout.
Capture storage trade-offs in ADRs: guarantees, limitations and reassessment triggers.
Make DB observability part of platform operations: query-class latency, resource saturation, replication lag and error budgets.

Section materials

Where to go next

Build a storage baseline first

Start with Data Storage Intro, Database Selection Framework and DDIA to build consistent storage reasoning.

Deepen engines and operations

Then continue with PostgreSQL/MySQL/MongoDB/Cassandra/ClickHouse overviews and DB internals for production-level decisions.

Related chapters

Enable tracking in Settings