This chapter is valuable because it shows storage evolution without mythology: teams move from files and simple OLTP models to NoSQL, NewSQL, and HTAP not because of fashion, but because real limits start piling up.
In practice, it helps you describe where state actually lives, how it moves between queues, object storage, and databases, and why that immediately creates requirements around idempotency, retries, and event ordering.
In interviews and design reviews, it is especially useful when you need to explain why a data architecture became more complex in that specific sequence, rather than because the team jumped to a heavyweight stack too early.
Practical value of this chapter
State model
Make state location explicit: in app memory, queues, databases, or object storage, with clear guarantees per step.
API from data shape
Design API contracts around storage behavior: idempotency, retries, event ordering, and deduplication.
NewSQL and HTAP fit
Know when NewSQL/HTAP simplify architecture and when it is safer to separate transactional and analytical paths.
Interview narrative
Explain data evolution from simple persistence to distributed architecture without introducing unnecessary complexity.
Source
Essential Architecture - Data
Transcript of the lecture (4 Oct 2021) about data storage and impact on the API.
This chapter is a concise guide to the evolution of state storage approaches: from files and classical OLTP to NoSQL, NewSQL, and HTAP. Data architecture directly shapes API quality: latency, consistency guarantees, retry semantics, idempotency, and team ownership boundaries.
We start from Twelve-Factor principle #6: keep applications stateless and move state outside the process boundary. This makes scaling and resilience easier, but it creates the central design question: where should state live, and how does that choice impact API contracts and integration behavior?
Why Data Drives APIs
Related topic
The Twelve-Factor App
Stateless as a foundation for scaling and sustainability.
Architectural decisions about data turn into properties of interfaces.
- Response speed and latency
- Consistency (strong vs eventual)
- Error and retry model
- Limitations on filtering/search/pagination
- Idempotency, retries and deduplication
- Boundaries of responsibility between teams
Stateless as a foundation
12-factor principle: Applications do not store state in the process. Scaling becomes easier, but you need to be conscious about where you store your data.
The Evolution of State Storage
File systems
Storage formats and reading logic flow easily into business code.
Relational databases (OLTP)
SQL+ transactions provide strong guarantees and an expressive API.
OLAP and analytics
Cubes, star/snowflake models and aggregates for BI.
Big Data / Hadoop
MapReduce and the Bulk Data Processing Ecosystem.
Object Storage
Objects without hierarchies, S3 API as a de facto standard.
NoSQL
Horizontal scaling at the cost of compromises.
NewSQL
SQL + ACID on distributed architecture for transactional workloads at scale.
HTAP
Convergence of OLTP and OLAP: near real-time analytics on top of operational data.
NewSQL and HTAP in architecture decisions
When NewSQL is the right fit
When you need SQL semantics, strong transactions, and horizontal growth without manual shard management.
When HTAP is the right fit
When product workflows require both operational transactions and near real-time analytics on the same domain.
Key risks
Higher operational complexity, expensive cross-region traffic, and limits for heavy analytical patterns.
How to frame this in interviews
Explain the pain solved, the trade-offs accepted, and the guardrails used to control delivery risk.
Practical rule of thumb: use NewSQL for stateful core workflows where correctness is expensive to fail, and HTAP for product domains that need analytics almost in sync with operational traffic.
Relational databases: key concepts
Related topic
Database Internals
B-Trees, LSM and transactions within the DBMS.
Normalization
Data shapes influence the design and behavior of queries.
SQL
Declarative language separates the “what” from the “how.”
Indexes
They speed up reading, but slow down writing and updating.
Transactions and ACID
Atomicity, isolation, and durability shape contracts.
Replication
Failover and scaling of readings from trade-offs based on consistency.
Sharding
Routing by shard key and load distribution.
Go deeper: Designing Data-Intensive Applications And Database Internals.
Integration between systems
Related topic
Enterprise Integration Patterns
Files, RPC and messaging as integration patterns.
File transfer
A clear way of exchange, but with weak encapsulation.
Shared database
High coupling and slow development due to the overall design.
RPC
Strong contracts, but requires versioning discipline.
Messaging
Asynchronous scripts and integration flexibility.
Shared database creates high coupling and breaks contracts between teams. Modern systems strive for shared-nothing.
Data Lake vs Data Mesh
Related topic
Big Data
The evolution of analytics and architectural layers.
Data Lake
Centralized data collection from OLTP with ETL processes. Scaling complicates data connectivity and quality.
Data Mesh
- Domain-centric decentralization
- Data as a product
- Self-service platform
- Federated computational governance
DDD and domain boundaries
Related topic
Learning Domain-Driven Design
Bounded contexts and domain contracts.
Domain boundaries and contracts between bounded contexts make APIs resilient. DDD approaches help to separate the data models of different teams.
How data is turned into a convenient API
Bridge Data -> API
- Predictable guarantees (ACID vs BASE)
- Clear Sources of Truth
- A clear model of errors and retries
- Domain and contract boundaries
- Idempotency and deduplication
- Isolation from shared database
NoSQL through the lens of CAP/BASE
Understanding CAP and BASE helps explain eventual consistency to clients and build correct retrays.
Mini-checklist of a convenient API
- It is clear what consistency guarantees the system provides.
- The client understands where eventual consistency is possible.
- Idempotency for operations that can be repeated.
- Errors, retries and timeouts are described deterministically.
- There is no shared database as a hidden integration channel.
- Domain boundaries are reflected in the API contract.
Practical storage-selection scenarios
FinTech ledger / billing
Relational DB or NewSQL
Strong consistency, strict transactions, and deterministic handling of retries, idempotency, and audit trails.
Real-time product reporting
HTAP or OLTP + streaming + OLAP
Fast analytics with minimal ETL lag while keeping operational workflows responsive.
Telemetry and monitoring
TSDB + object storage
High-ingest writes, retention controls, and cost-efficient long-term historical storage.
Content + search + recommendations
Polyglot persistence
One database is rarely optimal for transactional writes, full-text search, and vector retrieval at once.
Related chapters
- DB Guide - Practical playbook for selecting and operating data stores across different workload profiles.
- Database selection framework: how to make architecture decisions - Decision model for OLTP/OLAP/NoSQL choices under specific non-functional requirements.
- Designing Data-Intensive Applications (short summary) - Core concepts on data models, replication, and consistency that shape API behavior.
- Database Internals (short summary) - Storage engine internals (B-Tree, LSM, WAL) and their impact on latency and throughput.
- Enterprise Integration Patterns (short summary) - Integration patterns for choosing between file exchange, RPC, and messaging.
- CAP theorem - Baseline consistency-versus-availability trade-offs under network partition scenarios.
- Data Mesh in Action (short summary) - How data platforms evolve from centralized lake models toward domain ownership.
- The Twelve-Factor App: cloud-native principles - Stateless app principle as the starting point for external state storage architecture.
