System Design Space
Knowledge graphSettings

Updated: May 3, 2026 at 10:20 AM

Database Selection Framework

medium

A practical framework for choosing a DBMS by workload profile: OLTP and OLAP, read/write ratio, consistency, replication, sharding, ownership cost, and operational risk.

A strong database choice rarely looks like love at first sight with one technology. It is usually a process of ruling out the wrong options against real system constraints.

For real projects, this chapter helps compare candidates through read/write profile, SLA, query shape, consistency model, and operational cost instead of brand popularity.

For interviews and engineering reviews, it gives you a stronger position: do not just name the winner, show why the alternatives were rejected and what would trigger a future re-evaluation.

Practical value of this chapter

Criteria-based shortlist

Narrow to 2-3 database candidates using read/write profile, service-level needs, and query shape instead of technology popularity.

Decision score matrix

Compare options by consistency guarantees, p95/p99 latency, operational complexity, and total cost of ownership.

Risk register

Capture constraints early: vendor lock-in, multi-region latency, migration windows, and team expertise.

Interview defense

Explain why alternatives were rejected and which future signals would trigger a re-evaluation.

Foundation

Database Internals

Understanding DBMS internals explains why one engine fits transactional writes while another fits scans and aggregations.

Open chapter

Database Selection Framework is a practical way to choose a DBMS from product constraints instead of team preference. The chapter walks through the decision step by step: define the read/write profile, choose between transactional and analytical paths, set consistency requirements, and then make replication and sharding decisions with operational complexity in mind.

OLTP vs OLAP

OLTP

Many short transactions, high write concurrency, and strict requirements for latency and correctness.

  • p95/p99 latency is critical for user operations.
  • We need ACID guarantees and predictable transactional semantics.
  • Frequent point reads and writes by key.

PostgreSQL / MySQL / distributed SQL depending on scale and consistency requirements.

OLAP

Heavy analytical queries, aggregations over large ranges, columnar scans, and lower cost per read.

  • The main load is read-heavy analytics and reports.
  • Materialized data marts and bulk batch or streaming ingestion are expected.
  • A more expensive write path can be acceptable when it makes analysis fast.

ClickHouse / DWH / Lakehouse approach with a columnar engine.

Decision Framework (5 axes)

1. Read/Write profile

Evaluate the read/write ratio, operation size, traffic bursts, and latency requirements.

  • How many writes/sec and reads/sec are expected at the start and after a year?
  • How large is the hot set and which keys are used to access it?
  • Are there traffic spikes, and how often do you need backfills or bulk loads?

2. Consistency and transactions

Decide where strict consistency is needed and where eventual consistency is sufficient.

  • Are stale reads acceptable, and in which user flows is that safe?
  • Are multi-row/multi-entity transactions required?
  • Which read and write guarantees does the business require?

3. Replication

Determine the purpose of replication: high availability, geo-distribution, read scaling, or disaster recovery.

  • Is sync replication necessary for critical data?
  • What are the RPO/RTO target metrics for disaster recovery?
  • How much replica lag is acceptable for the product?

4. Sharding

Consider a sharding strategy only after vertical scaling and indexing stop solving the problem.

  • Which shard key minimizes hotspots and cross-shard queries?
  • How painful are re-sharding and key migrations?
  • Which operations will become distributed transactions?

5. Operational complexity

Compare not only technical fit, but also total cost of ownership: team expertise, tooling, and runbooks.

  • Does the team have expertise in the selected DBMS?
  • How do backup/restore, observability, and on-call practices work?
  • How much does a cluster cost with target volume and SLA?

Quick selection matrix

Transactional backend (orders, payments, accounts)

OLTP first

Strict consistency, transactions and low-latency writes/reads.

Product analytics, BI and ad-hoc queries

OLAP first

Large scans and aggregations are more efficient in columnar stores.

Mixed workload (operations + analytics)

Polyglot persistence

Separate the OLTP write path from OLAP serving through a CDC/ETL/ELT pipeline.

Global scale with regional SLAs

Replication + selective sharding

Combination of read locality, HA and controlled cost of operations.

Practice

Replication and sharding

Practical models and visualizations: primary-replica, multi-leader, shard key, and rebalancing.

Open chapter

Replication and sharding: minimum rules

Replication

  • Define a sync/async strategy based on data criticality.
  • Measure replication lag and impact on read consistency.
  • Practice failover regularly instead of treating it as a paper design.

Sharding

  • Choose a shard key for real query patterns, not for an abstract model.
  • Evaluate cross-shard joins and distributed transaction cost.
  • Plan a migration path for re-sharding in advance.

Common mistakes in choosing a DBMS

Choose a database based on hype, without a workload profile and service-level requirements.

Premature sharding before vertical scaling and indexes are exhausted.

Ignore operational cost: backups, migrations, on-call ownership, and observability.

Mix OLTP and heavy analytics in one cluster without resource isolation.

Treat CAP/PACELC as theoretical and do not make explicit consistency assumptions.

References

Related chapters

Enable tracking in Settings