System Design Space
Knowledge graphSettings

Updated: May 2, 2026 at 3:20 PM

Introduction to Data Storage

easy

Introduction to state storage, external storage for stateless applications, consistency guarantees, API contracts, integration patterns, and the evolution from OLTP to NoSQL, NewSQL, and HTAP.

This chapter shows storage evolution without mythology: teams move from files and simple OLTP models to NoSQL, NewSQL, and HTAP because real limits start piling up, not because the technology is fashionable.

In practice, it helps you describe where state actually lives, how it moves between queues, object storage, and databases, and why that immediately creates requirements around idempotency, retries, and event ordering.

In interviews and architecture discussions, it helps explain why the data architecture became more complex in that sequence instead of jumping to a heavyweight stack too early.

Practical value of this chapter

State model

Make state location explicit: in app memory, queues, databases, or object storage, with clear guarantees per step.

API from data shape

Design API contracts around storage behavior: idempotency, retries, event ordering, and deduplication.

NewSQL and HTAP fit

Know when NewSQL/HTAP simplify architecture and when it is safer to separate transactional and analytical paths.

Interview narrative

Explain data evolution from simple persistence to distributed architecture without introducing unnecessary complexity.

Source

Essential Architecture - Data

Transcript of the 4 October 2021 lecture on data storage and how storage choices shape APIs.

Перейти на сайт

This chapter connects stateless applications, stateful components, data models, API contracts, and consistency guarantees. Storage choices directly affect latency, retries, idempotency, deduplication, and team ownership of data.

We then move from files and classic OLTP to NoSQL, NewSQL, HTAP, object storage, data lakes, and Data Mesh. The core idea stays simple: move state outside the process, but be explicit about where it lives and what guarantees the API receives.

Why Data Drives APIs

Related topic

The Twelve-Factor App

Stateless applications as a foundation for scaling and resilience.

Читать обзор

Architectural decisions about data turn into properties of interfaces.

  • Response latency and throughput
  • Boundaries between strong and eventual consistency
  • Error, retry, and deduplication model
  • Limitations on filtering/search/pagination
  • Idempotency for repeatable operations
  • Team ownership of data and contracts

Stateless as a foundation

Twelve-Factor principle: applications do not store state in the process. Scaling becomes easier, but the storage location must be chosen explicitly.

Related topic: The Twelve-Factor App.

The Evolution of State Storage

File systems

File formats and read logic quickly leak into application code.

Relational databases (OLTP)

SQL and transactions provide strong guarantees and an expressive interface.

OLAP and analytics

Cubes, star and snowflake schemas, and aggregates support BI workloads.

Big Data / Hadoop

MapReduce and Hadoop ecosystem tools support large-scale batch processing.

Object storage

Objects without rigid hierarchies, with S3 API as the de facto standard.

NoSQL

Horizontal scaling in exchange for explicit consistency and query trade-offs.

NewSQL

SQL and ACID guarantees in distributed architecture for transactional workloads.

HTAP

Convergence of OLTP and OLAP: near-real-time analytics next to operational data.

NewSQL and HTAP in architecture decisions

When NewSQL is the right fit

When you need SQL semantics, strong transactions, and horizontal growth without manual shard management.

When HTAP is the right fit

When product workflows need operational transactions and near-real-time analytics over the same data.

Key risks

Higher operational complexity, expensive cross-region queries, and limits for heavy analytical workloads.

How to frame this in interviews

Explain the pain being solved, the trade-offs being accepted, and the constraints that reduce risk.

Practical rule of thumb: use NewSQL for stateful core workflows where correctness is expensive to get wrong, and HTAP for product domains that need analytics almost in sync with operational traffic.

Relational databases: key concepts

Related topic

Database Internals

B-Trees, LSM and transactions within the DBMS.

Читать обзор

Normalization

Data shape influences schema design and query behavior.

SQL

Declarative language separates the “what” from the “how.”

Indexes

They speed up reads, but slow down writes and updates.

Transactions and ACID

Atomicity, isolation, and durability shape system contracts.

Replication

Failover and read scaling with explicit consistency trade-offs.

Sharding

Routing by shard key and distributing load across partitions.

Integration between systems

Related topic

Enterprise Integration Patterns

Files, RPC, and messaging as integration patterns.

Читать обзор

File transfer

A simple exchange pattern, but with weak encapsulation.

Shared database

High coupling and slower delivery because teams share the same schema.

RPC

Strong contracts, but requires versioning discipline.

Messaging

Asynchronous workflows and flexible integration boundaries.

A shared database creates high coupling and breaks contracts between teams. Modern systems prefer explicit ownership boundaries and interfaces.

Data Lake vs Data Mesh

Related topic

Big Data

The evolution of analytics and architectural layers.

Читать обзор

Data Lake

Centralized data collection from OLTP systems through ETL processes. As scale grows, ownership and data quality become harder to manage.

Data Mesh

  • Domain-centric decentralization
  • Data as a product
  • Self-service platform
  • Federated computational governance

DDD and domain boundaries

Related topic

Learning Domain-Driven Design

Bounded contexts and domain contracts.

Читать обзор

Domain boundaries and contracts between bounded contexts make APIs resilient. DDD helps separate data models owned by different teams.

How data is turned into a convenient API

Bridge data -> API

  • Predictable guarantees (ACID vs BASE)
  • Clear sources of truth
  • Clear error and retry model
  • Domain and contract boundaries
  • Idempotency and deduplication
  • Isolation from shared database

NoSQL through the lens of CAP/BASE

Understanding CAP and BASE helps explain eventual consistency to clients and build correct retries.

Related topic: CAP theorem.

Mini-checklist of a convenient API

  • It is clear what consistency guarantees the system provides.
  • The client understands where eventual consistency is possible.
  • Idempotency for operations that can be repeated.
  • Errors, retries and timeouts are described deterministically.
  • There is no shared database as a hidden integration channel.
  • Domain boundaries are reflected in the API contract.

Practical storage-selection scenarios

FinTech ledger / billing

Relational DB or NewSQL

Strong consistency, strict transactions, and deterministic handling of retries, idempotency, and audit trails.

Real-time product reporting

HTAP or OLTP + streaming + OLAP

Fast analytics with minimal ETL lag while keeping operational workflows responsive.

Telemetry and monitoring

TSDB + object storage

High-ingest writes, retention controls, and cost-efficient long-term historical storage.

Content + search + recommendations

Polyglot persistence

One database is rarely optimal for transactions, full-text search, and vector retrieval at the same time.

Related chapters

Enable tracking in Settings