Source
Essential Architecture - Data
Transcript of the lecture (4 Oct 2021) about data storage and impact on the API.
Data directly shapes the usability of APIs: from latency and consistency guarantees to retries, idempotency and boundaries of responsibility between teams. Let's start with the 12-factor application principle - principle number 6. Its essence is to make stateless applications and not store state inside them. If you can design an application this way, issues of fault tolerance and scaling are much easier to solve than in the case of stateful applications. But then the question arises: where to store the state?
Why Data Drives APIs
Related topic
The Twelve-Factor App
Stateless as a foundation for scaling and sustainability.
Architectural decisions about data turn into properties of interfaces.
- Response speed and latency
- Consistency (strong vs eventual)
- Error and retry model
- Limitations on filtering/search/pagination
- Idempotency, retrays and deduplication
- Boundaries of responsibility between teams
Stateless as a foundation
12-factor principle: Applications do not store state in the process. Scaling becomes easier, but you need to be conscious about where you store your data.
The Evolution of State Storage
File systems
Storage formats and reading logic flow easily into business code.
Relational databases (OLTP)
SQL+ transactions provide strong guarantees and an expressive API.
OLAP and analytics
Cubes, star/snowflake models and aggregates for BI.
Big Data / Hadoop
MapReduce and the Bulk Data Processing Ecosystem.
Object Storage
Objects without hierarchies, S3 API as a de facto standard.
NoSQL
Horizontal scaling at the cost of compromises.
Relational databases: key concepts
Related topic
Database Internals
B-Trees, LSM and transactions within the DBMS.
Normalization
Data shapes influence the design and behavior of queries.
SQL
Declarative language separates the “what” from the “how.”
Indexes
They speed up reading, but slow down writing and updating.
Transactions and ACID
Atomicity, isolation, and durability shape contracts.
Replication
Failover and scaling of readings from trade-offs based on consistency.
Sharding
Routing by shard key and load distribution.
Go deeper: Designing Data-Intensive Applications And Database Internals.
Integration between systems
Related topic
Enterprise Integration Patterns
Files, RPC and messaging as integration patterns.
File transfer
A clear way of exchange, but with weak encapsulation.
Shared database
High coupling and slow development due to the overall design.
RPC
Strong contracts, but requires versioning discipline.
Messaging
Asynchronous scripts and integration flexibility.
Shared database creates high coupling and breaks contracts between teams. Modern systems strive for shared-nothing.
Data Lake vs Data Mesh
Related topic
Big Data
The evolution of analytics and architectural layers.
Data Lake
Centralized data collection from OLTP with ETL processes. Scaling complicates data connectivity and quality.
Data Mesh
- Domain-centric decentralization
- Data as a product
- Self-service platform
- Federated computational governance
DDD and domain boundaries
Related topic
Learning Domain-Driven Design
Bounded contexts and domain contracts.
Domain boundaries and contracts between bounded contexts make APIs resilient. DDD approaches help to separate the data models of different teams.
How data is turned into a convenient API
Bridge Data -> API
- Predictable guarantees (ACID vs BASE)
- Clear Sources of Truth
- A clear model of errors and retries
- Domain and contract boundaries
- Idempotency and deduplication
- Isolation from shared database
NoSQL through the lens of CAP/BASE
Understanding CAP and BASE helps explain eventual consistency to clients and build correct retrays.
Mini-checklist of a convenient API
- It is clear what consistency guarantees the system provides.
- The client understands where eventual consistency is possible.
- Idempotency for operations that can be repeated.
- Errors, retrays and timeouts are described deterministically.
- There is no shared database as a hidden integration channel.
- Domain boundaries are reflected in the API contract.
Materials from the lecture
Recommended sources and books for deepening:
