System Design Space
Knowledge graphSettings

Updated: March 25, 2026 at 2:00 AM

Why understand storage systems?

easy

Introductory chapter: Database types, data models, and storage tradeoffs.

The database section matters not because it lists technologies, but because it brings the discussion back to the hard part: which data and query properties force a particular storage choice.

In day-to-day engineering work, this chapter helps split a system into distinct storage roles: transactional core, analytical projections, search layer, cache, and event logs, instead of forcing one database to do everything.

For interviews and design reviews, it sets the right frame: workload profile, consistency, latency, and operating cost first, and only then the name of a specific engine.

Practical value of this chapter

Workload map

Break the product into OLTP/OLAP/streaming profiles and define where strict consistency is required versus where delay is acceptable.

Storage boundaries

Assign data ownership by domain so source-of-truth systems stay separate from indexes, caches, and analytical projections.

Evolution roadmap

Plan migration from a single storage model to polyglot persistence without service interruption and with explicit risk control.

Interview framing

Defend decisions through CAP/PACELC trade-offs, latency budgets, and operating cost, not by naming technologies.

Context

Designing Data-Intensive Applications

A core source on data models, consistency, replication and architecture trade-offs in storage systems.

Читать обзор

The Storage Systems section helps you treat data architecture as a system-design foundation, not as a late implementation detail. In production, storage decisions directly define reliability, latency, cost profile and scaling limits.

This chapter connects System Design with practical DB choices: which data model to use, where strict consistency is required and how storage evolves with product growth.

Why this section matters

Storage choices shape system boundaries

Data model and database decisions affect API contracts, consistency semantics, latency profile and scaling strategy.

Storage trade-offs are core architecture decisions

SQL, document, key-value, wide-column and graph databases solve different problem classes with different guarantees.

Data reliability requires explicit engineering

Replication, transactions, backup and recovery design must be treated as first-class architecture concerns.

Wrong database decisions are expensive to reverse

Late migrations of data model and storage strategy usually cost more than early domain-level validation.

Storage competence is mandatory in system design

In interviews and production work, engineers are expected to justify DB decisions through workload, consistency and ownership cost.

How to go through storage systems step by step

Step 1

Define workload profile and critical paths

Start with read/write ratio, data growth, latency targets and RTO/RPO requirements for critical product flows.

Step 2

Choose the right data model for the domain

Map domain entities and query patterns to relational, document, key-value, columnar or graph models.

Step 3

Set consistency and transaction boundaries

Define where strict ACID is mandatory and where eventual consistency with compensating mechanisms is acceptable.

Step 4

Design scaling and availability strategy

Plan replication, partitioning, failover and caching before production-scale traffic arrives.

Step 5

Treat storage maturity as a roadmap

Include schema migration policy, data lifecycle management, cost control and observability in long-term planning.

Key storage trade-offs

Strong consistency vs latency and availability

The stricter the consistency guarantees, the higher the distributed-write cost and the harder low-latency operation becomes.

Normalization vs read performance

Normalized schemas improve integrity, while denormalization is often needed for high-throughput read-heavy workloads.

Single DB standard vs specialized storage stack

A single engine reduces operations overhead, but polyglot persistence can better fit diverse data access patterns.

Managed DB speed vs infrastructure control

Managed services accelerate delivery but can limit low-level tuning, portability and cost optimization at scale.

What this section covers

Storage foundations

Data models, DB selection principles and core foundations for data-intensive system architecture.

Database engines in practice

Practical OLTP/OLAP and NoSQL engine landscape with architectural constraints and operational trade-offs.

How to apply this in practice

Common pitfalls

Choosing a database by popularity instead of workload and consistency requirements.
Deferring replication, backup and recovery strategy until after incidents happen.
Adopting polyglot persistence without operational readiness and ownership boundaries.
Ignoring schema evolution and migration compatibility while product complexity grows.

Recommendations

Start DB selection from explicit latency, consistency, RPO/RTO and expected growth constraints.
Validate the data model against real query patterns and failure scenarios before production rollout.
Capture storage trade-offs in ADRs: guarantees, limitations and reassessment triggers.
Make DB observability part of platform operations: query-class latency, saturation, replication lag and error budgets.

Section materials

Where to go next

Build a storage baseline first

Start with Data Storage Intro, Database Selection Framework and DDIA to build consistent storage reasoning.

Deepen engines and operations

Then continue with PostgreSQL/MySQL/MongoDB/Cassandra/ClickHouse overviews and DB internals for production-level decisions.

Related chapters

Enable tracking in Settings