System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 11:59 PM

Introduction to Data Storage

easy

Essential Architecture #Data lecture notes: where to store state, the evolution of storage, and how data shapes APIs.

Source

Essential Architecture - Data

Transcript of the lecture (4 Oct 2021) about data storage and impact on the API.

Перейти на сайт

Data directly shapes the usability of APIs: from latency and consistency guarantees to retries, idempotency and boundaries of responsibility between teams. Let's start with the 12-factor application principle - principle number 6. Its essence is to make stateless applications and not store state inside them. If you can design an application this way, issues of fault tolerance and scaling are much easier to solve than in the case of stateful applications. But then the question arises: where to store the state?

Why Data Drives APIs

Related topic

The Twelve-Factor App

Stateless as a foundation for scaling and sustainability.

Читать обзор

Architectural decisions about data turn into properties of interfaces.

  • Response speed and latency
  • Consistency (strong vs eventual)
  • Error and retry model
  • Limitations on filtering/search/pagination
  • Idempotency, retrays and deduplication
  • Boundaries of responsibility between teams

Stateless as a foundation

12-factor principle: Applications do not store state in the process. Scaling becomes easier, but you need to be conscious about where you store your data.

Related topic: The Twelve-Factor App.

The Evolution of State Storage

File systems

Storage formats and reading logic flow easily into business code.

Relational databases (OLTP)

SQL+ transactions provide strong guarantees and an expressive API.

OLAP and analytics

Cubes, star/snowflake models and aggregates for BI.

Big Data / Hadoop

MapReduce and the Bulk Data Processing Ecosystem.

Object Storage

Objects without hierarchies, S3 API as a de facto standard.

NoSQL

Horizontal scaling at the cost of compromises.

Relational databases: key concepts

Related topic

Database Internals

B-Trees, LSM and transactions within the DBMS.

Читать обзор

Normalization

Data shapes influence the design and behavior of queries.

SQL

Declarative language separates the “what” from the “how.”

Indexes

They speed up reading, but slow down writing and updating.

Transactions and ACID

Atomicity, isolation, and durability shape contracts.

Replication

Failover and scaling of readings from trade-offs based on consistency.

Sharding

Routing by shard key and load distribution.

Integration between systems

Related topic

Enterprise Integration Patterns

Files, RPC and messaging as integration patterns.

Читать обзор

File transfer

A clear way of exchange, but with weak encapsulation.

Shared database

High coupling and slow development due to the overall design.

RPC

Strong contracts, but requires versioning discipline.

Messaging

Asynchronous scripts and integration flexibility.

Shared database creates high coupling and breaks contracts between teams. Modern systems strive for shared-nothing.

Data Lake vs Data Mesh

Related topic

Big Data

The evolution of analytics and architectural layers.

Читать обзор

Data Lake

Centralized data collection from OLTP with ETL processes. Scaling complicates data connectivity and quality.

Data Mesh

  • Domain-centric decentralization
  • Data as a product
  • Self-service platform
  • Federated computational governance

DDD and domain boundaries

Related topic

Learning Domain-Driven Design

Bounded contexts and domain contracts.

Читать обзор

Domain boundaries and contracts between bounded contexts make APIs resilient. DDD approaches help to separate the data models of different teams.

How data is turned into a convenient API

Bridge Data -> API

  • Predictable guarantees (ACID vs BASE)
  • Clear Sources of Truth
  • A clear model of errors and retries
  • Domain and contract boundaries
  • Idempotency and deduplication
  • Isolation from shared database

NoSQL through the lens of CAP/BASE

Understanding CAP and BASE helps explain eventual consistency to clients and build correct retrays.

Related topic: CAP theorem.

Mini-checklist of a convenient API

  • It is clear what consistency guarantees the system provides.
  • The client understands where eventual consistency is possible.
  • Idempotency for operations that can be repeated.
  • Errors, retrays and timeouts are described deterministically.
  • There is no shared database as a hidden integration channel.
  • Domain boundaries are reflected in the API contract.

Materials from the lecture

Recommended sources and books for deepening:

Designing Data-Intensive ApplicationsEnterprise Integration PatternsNoSQL DistilledDatabase InternalsBig DataData Mesh Principles and Logical Architecture

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov