Foundation
Database Internals
Understanding the internal mechanisms of a DBMS helps you make meaningful architectural decisions.
Database Selection Framework is a practical way to choose a DBMS for a specific product and its limitations, and not according to the taste of the team. The purpose of the chapter: to structurally go through the choice between OLTP and OLAP, fix the read/write profile, consistency requirements, and then make decisions on replication and sharding, taking into account operational complexity.
OLTP vs OLAP
OLTP
Many short transactions, high concurrency, strict requirements for latency and recording correctness.
- p95/p99 latency is critical for user operations.
- We need ACID guarantees and predictable transactional semantics.
- Frequent point reads/writes, work according to the key.
PostgreSQL / MySQL / distributed SQL depending on scale and consistency requirements.
OLAP
Heavy analytical queries, aggregations over large ranges, columnar scan and read savings.
- The main load is read-heavy analytics and reports.
- We need materialized showcases and mass batch/stream ingest.
- Compromise on the write path for the sake of quick analysis.
ClickHouse / DWH/Lakehouse approach with columnar engine.
Decision Framework (5 axes)
1. Read/Write profile
Evaluate the R/W ratio, size of operations, burst pattern, required SLA for latency.
- How many writes/sec and reads/sec are expected at the start and after a year?
- How much data is in the hot-set and what is the nature of the access keys?
- Are there any spikes in traffic and how often are backfills/bulk downloads needed?
2. Consistency and transactions
Decide where strict consistency is needed and where eventual consistency is sufficient.
- Are stale reads acceptable and in which user flows is it safe?
- Are multi-row/multi-entity transactions required?
- What level of read/write concern does the business require?
3. Replication
Determine the purpose of replication: HA, geodistribution, read scaling, disaster recovery.
- Is sync replication necessary for critical data?
- What are the RPO/RTO target metrics for disaster recovery?
- What lag on async replicas is acceptable for the product?
4. Sharding
Check the need for a shard strategy only when vertical scaling no longer solves the problem.
- Which shard key minimizes hotspot and cross-shard queries?
- How painful are re-sharding and key migrations?
- Which operations will become distributed transactions?
5. Operational complexity
Compare not only technical suitability, but also cost of ownership: commands, tooling, runbooks.
- Does the team have expertise in the selected DBMS?
- How do backup/restore, observability, and on-call practices work?
- How much does a cluster cost with target volume and SLA?
Quick selection matrix
Transactional backend (orders, payments, accounts)
OLTP-first
Strict consistency, transactions and low-latency writes/reads.
Product analytics, BI and ad-hoc queries
OLAP-first
Large scans and aggregations are more efficient in columnar storages.
Mixed workload (operations + analytics)
Polyglot persistence
Separate OLTP write-path from OLAP serving via CDC/ETL/ELT pipeline.
Global scale with regional SLAs
Replication + selective sharding
Combination of read locality, HA and controlled cost of operations.
Practice
Replication and sharding
Practical models and visualizations: primary-replica, multi-leader, shard key and rebalance.
Replication and sharding: minimum rules
Replication
- Define a sync/async strategy based on data criticality.
- Measure replication lag and impact on read consistency.
- Check failover not on paper, but on regular game days.
Sharding
- Choose a shard key for real query patterns, not for abstraction.
- Evaluate cross-shard joins and distributed transaction cost.
- Plan a migration path for re-sharding in advance.
Common mistakes in choosing a DBMS
Choose a database only based on hype, without a workload profile and SLA.
Premature sharding before vertical scaling and indexes are exhausted.
Ignore the cost of operation: backup, migrations, on-call, observability.
Mix OLTP and heavy analytics in one cluster without resource isolation.
Treat CAP/PACELC as theoretical and do not make explicit consistency assumptions.
