System Design Space
Knowledge graphSettings

Updated: March 25, 2026 at 2:00 AM

Introduction to Data Storage

easy

A concise guide to the evolution of state storage approaches: from files and OLTP to NoSQL, NewSQL, and HTAP, and how these choices shape API contracts.

This chapter is valuable because it shows storage evolution without mythology: teams move from files and simple OLTP models to NoSQL, NewSQL, and HTAP not because of fashion, but because real limits start piling up.

In practice, it helps you describe where state actually lives, how it moves between queues, object storage, and databases, and why that immediately creates requirements around idempotency, retries, and event ordering.

In interviews and design reviews, it is especially useful when you need to explain why a data architecture became more complex in that specific sequence, rather than because the team jumped to a heavyweight stack too early.

Practical value of this chapter

State model

Make state location explicit: in app memory, queues, databases, or object storage, with clear guarantees per step.

API from data shape

Design API contracts around storage behavior: idempotency, retries, event ordering, and deduplication.

NewSQL and HTAP fit

Know when NewSQL/HTAP simplify architecture and when it is safer to separate transactional and analytical paths.

Interview narrative

Explain data evolution from simple persistence to distributed architecture without introducing unnecessary complexity.

Source

Essential Architecture - Data

Transcript of the lecture (4 Oct 2021) about data storage and impact on the API.

Перейти на сайт

This chapter is a concise guide to the evolution of state storage approaches: from files and classical OLTP to NoSQL, NewSQL, and HTAP. Data architecture directly shapes API quality: latency, consistency guarantees, retry semantics, idempotency, and team ownership boundaries.

We start from Twelve-Factor principle #6: keep applications stateless and move state outside the process boundary. This makes scaling and resilience easier, but it creates the central design question: where should state live, and how does that choice impact API contracts and integration behavior?

Why Data Drives APIs

Related topic

The Twelve-Factor App

Stateless as a foundation for scaling and sustainability.

Читать обзор

Architectural decisions about data turn into properties of interfaces.

  • Response speed and latency
  • Consistency (strong vs eventual)
  • Error and retry model
  • Limitations on filtering/search/pagination
  • Idempotency, retries and deduplication
  • Boundaries of responsibility between teams

Stateless as a foundation

12-factor principle: Applications do not store state in the process. Scaling becomes easier, but you need to be conscious about where you store your data.

Related topic: The Twelve-Factor App.

The Evolution of State Storage

File systems

Storage formats and reading logic flow easily into business code.

Relational databases (OLTP)

SQL+ transactions provide strong guarantees and an expressive API.

OLAP and analytics

Cubes, star/snowflake models and aggregates for BI.

Big Data / Hadoop

MapReduce and the Bulk Data Processing Ecosystem.

Object Storage

Objects without hierarchies, S3 API as a de facto standard.

NoSQL

Horizontal scaling at the cost of compromises.

NewSQL

SQL + ACID on distributed architecture for transactional workloads at scale.

HTAP

Convergence of OLTP and OLAP: near real-time analytics on top of operational data.

NewSQL and HTAP in architecture decisions

When NewSQL is the right fit

When you need SQL semantics, strong transactions, and horizontal growth without manual shard management.

When HTAP is the right fit

When product workflows require both operational transactions and near real-time analytics on the same domain.

Key risks

Higher operational complexity, expensive cross-region traffic, and limits for heavy analytical patterns.

How to frame this in interviews

Explain the pain solved, the trade-offs accepted, and the guardrails used to control delivery risk.

Practical rule of thumb: use NewSQL for stateful core workflows where correctness is expensive to fail, and HTAP for product domains that need analytics almost in sync with operational traffic.

Relational databases: key concepts

Related topic

Database Internals

B-Trees, LSM and transactions within the DBMS.

Читать обзор

Normalization

Data shapes influence the design and behavior of queries.

SQL

Declarative language separates the “what” from the “how.”

Indexes

They speed up reading, but slow down writing and updating.

Transactions and ACID

Atomicity, isolation, and durability shape contracts.

Replication

Failover and scaling of readings from trade-offs based on consistency.

Sharding

Routing by shard key and load distribution.

Integration between systems

Related topic

Enterprise Integration Patterns

Files, RPC and messaging as integration patterns.

Читать обзор

File transfer

A clear way of exchange, but with weak encapsulation.

Shared database

High coupling and slow development due to the overall design.

RPC

Strong contracts, but requires versioning discipline.

Messaging

Asynchronous scripts and integration flexibility.

Shared database creates high coupling and breaks contracts between teams. Modern systems strive for shared-nothing.

Data Lake vs Data Mesh

Related topic

Big Data

The evolution of analytics and architectural layers.

Читать обзор

Data Lake

Centralized data collection from OLTP with ETL processes. Scaling complicates data connectivity and quality.

Data Mesh

  • Domain-centric decentralization
  • Data as a product
  • Self-service platform
  • Federated computational governance

DDD and domain boundaries

Related topic

Learning Domain-Driven Design

Bounded contexts and domain contracts.

Читать обзор

Domain boundaries and contracts between bounded contexts make APIs resilient. DDD approaches help to separate the data models of different teams.

How data is turned into a convenient API

Bridge Data -> API

  • Predictable guarantees (ACID vs BASE)
  • Clear Sources of Truth
  • A clear model of errors and retries
  • Domain and contract boundaries
  • Idempotency and deduplication
  • Isolation from shared database

NoSQL through the lens of CAP/BASE

Understanding CAP and BASE helps explain eventual consistency to clients and build correct retrays.

Related topic: CAP theorem.

Mini-checklist of a convenient API

  • It is clear what consistency guarantees the system provides.
  • The client understands where eventual consistency is possible.
  • Idempotency for operations that can be repeated.
  • Errors, retries and timeouts are described deterministically.
  • There is no shared database as a hidden integration channel.
  • Domain boundaries are reflected in the API contract.

Practical storage-selection scenarios

FinTech ledger / billing

Relational DB or NewSQL

Strong consistency, strict transactions, and deterministic handling of retries, idempotency, and audit trails.

Real-time product reporting

HTAP or OLTP + streaming + OLAP

Fast analytics with minimal ETL lag while keeping operational workflows responsive.

Telemetry and monitoring

TSDB + object storage

High-ingest writes, retention controls, and cost-efficient long-term historical storage.

Content + search + recommendations

Polyglot persistence

One database is rarely optimal for transactional writes, full-text search, and vector retrieval at once.

Related chapters

Enable tracking in Settings