Time Series Databases (TSDB): types, trade-offs, and selection

Teams almost always misread the first TSDB problem: it looks like volume matters most, but in practice cardinality, retention, and query cost break the system first.

In practice, this chapter helps design a TSDB through ingestion rate, tag strategy, downsampling, and storage tiers so the observability platform does not collapse under its own cost before it starts helping during incidents.

In interviews and engineering discussions, it is especially useful when you need to justify a time-series choice through retention horizon, query profile, and acceptable observability cost rather than a list of fashionable products.

Practical value of this chapter

Cardinality budget

Estimate metric cardinality and tag strategy early to avoid storage blowups and unstable query latency.

Retention policy

Design downsampling and storage tiers for both incident response and long-term analysis needs.

Alert semantics

Connect data model to alert rules: aggregation windows, noise control, SLO relevance, and flap prevention.

Interview angle

Frame TSDB choice through ingestion rate, retention horizon, and acceptable observability cost.

Decision frame and editorial focus

Chapter focus

TSDB patterns, retention strategy, compaction, and monitoring workloads

Workload profile

Start from the specialized query: analytics, search, time series, graph traversal, vector retrieval, or monitoring metrics.

Good fit

The choice is justified when the index or storage model directly matches product behavior and relieves the source of truth.

Boundary and risk

The danger is turning a specialized layer into a universal database and losing consistency, freshness, and ownership boundaries.

Connect next

Connect the chapter to the OLTP source, data pipeline, retention/compaction, and read-model architecture.

Primary source

Prometheus TSDB

Storage fundamentals for metrics: local blocks, data retention, compaction, and remote write.

Open documentation

Time Series Databases (TSDB) are chosen across four axes: data model, storage engine, workload profile, and operating model. A wrong pick rarely shows up at once — the cluster runs for months and only hits the storage budget or the cardinality ceiling under load. This chapter extends Database Selection Framework and helps you decide where a purpose-built TSDB is really needed versus where an SQL/columnar engine used as a time-series platform is the cheaper call.

TSDB selection map: four axes

1. Data model and query language

Purpose-built TSDB, SQL extension on top of an RDBMS, a TSDB layer over distributed storage, or a columnar analytical engine.

This is where the cost of exploratory queries is set: whether joins and BI integration run on plain SQL or detour through exports.

2. Storage design

Append-only data, LSM-style writes, time partitioning, row or column layout, built-in compression, and downsampling.

This choice sets the ceiling on write throughput, the storage bill, and how long a long-window aggregation takes to compute.

3. Primary workload

Monitoring, IoT telemetry, financial time series, logs/metrics for product analytics.

Each workload wants a different balance of latency, retention, cardinality, and query flexibility — what wins for monitoring loses for product analytics.

4. Operating model

Self-hosted, managed cloud, or hybrid with multiple storage tiers.

This axis drives total cost of ownership, the load on the operations team, and how fast you can grow the observability or analytics platform.

Main TSDB families

1. Purpose-built TSDB systems

Designed from day one for time-series workloads and high write ingestion — everything else in them is subordinate to the write path and retention.

Typical characteristics

Append-only write path optimized for high ingestion throughput.
Time/value compression, retention/TTL, and downsampling out of the box.
Time-bucket aggregations and window functions as a first-class use case.

Typical products

InfluxDB, Prometheus, VictoriaMetrics, M3, Thanos, Graphite/Whisper

When to use

Infrastructure monitoring and application metrics.
IoT telemetry with relatively simple label schemas.
Workloads where write path and retention matter more than complex joins.

Trade-offs

Complex relational analytics is often limited.
High-cardinality labels can increase storage and query cost quickly.

2. TSDB as an extension of relational DB

Time series live inside a familiar SQL engine such as PostgreSQL, next to the domain tables you already have.

Typical characteristics

SQL as the main query language and easy BI integration.
Time partitioning, hypertables/continuous aggregates, and space sharding.
Joins with OLTP tables and one data model for metrics plus domain entities.

Typical products

TimescaleDB, PipelineDB-like patterns, kdb+

When to use

You need complex SQL and frequent joins with transactional data.
Your team is already strong in the PostgreSQL ecosystem.
You want fewer technologies in the platform stack.

Trade-offs

Peak ingest is often lower than in highly specialized TSDB engines.
At very large scale, tuning complexity grows quickly.

3. TSDB on top of distributed storage systems

When the count runs to trillions of points, the time-series layer sits over HBase/Cassandra/Bigtable-like storage — you buy scale at the price of operational complexity.

Typical characteristics

Horizontal scale to very large volumes with long retention.
Fault tolerance and replication inherited from the underlying KV/column-store layer.
Usually a multi-component architecture with explicit capacity planning.

Typical products

OpenTSDB (HBase), KairosDB/Cassandra patterns, custom TSDB schemas

When to use

Telecom/cloud/SaaS environments with trillions of points and long retention.
You need linear scale-out as cluster size grows.
The team is ready for distributed-system operational complexity.

Trade-offs

Higher administration and tuning complexity.
Longer path from idea to production due to many moving parts.

4. Columnar analytical DBs used as TSDB

General-purpose columnar systems were never built for metrics, yet in practice they often end up holding metrics, logs, and time series at once.

Typical characteristics

Strong columnar compression and fast aggregations on long time windows.
Flexible ad-hoc SQL and BI-friendly querying over event data.
A practical balance between ingest throughput and analytical flexibility.

Typical products

ClickHouse, Apache Druid, Apache Pinot, MPP DWH (Vertica, etc.)

When to use

Unified logs + metrics + BI analytics workflows.
You need advanced slicing, retention/cohort analysis, and product reporting.
Data and product teams need flexible ad-hoc analytics.

Trade-offs

Not always a full replacement for monitoring-native TSDB alerting.
Low-latency operational monitoring often still needs a dedicated layer.

Fast scenario-based selection

Infrastructure monitoring

Native TSDB (Prometheus/VictoriaMetrics) + long-term storage layer

Strong ecosystem for alerting, SLO tracking, and dashboard-driven operations.

IoT telemetry and device events

Native TSDB or SQL extension (TimescaleDB), depending on analytics depth

Key factors: ingest rate, tag cardinality, and retention/downsampling policies.

Financial and market time series

SQL/columnar path (kdb+, ClickHouse, TimescaleDB)

You typically need precise aggregations, window functions, and complex analytics.

Logs and metrics plus BI and ad-hoc queries

Columnar analytics (ClickHouse/Druid/Pinot)

Here the win is SQL flexibility and the speed of large analytical scans, not low latency on point queries.

Self-hosted vs managed cloud

Self-hosted

Maximum control over storage layout, indexing, and upgrade strategy.
Good fit when you already have mature SRE/DBA practices and custom operations requirements.
TCO strongly depends on team expertise and automation quality.

Managed cloud

Faster time-to-value with lower operations overhead and predictable provider SLA.
Good fit when delivery speed matters more than deep platform customization.
Model egress, retention tiers, and lock-in risk early.

Recommendations for the first selection

Start from the workload profile: ingest/sec, cardinality, retention horizon, and query patterns — without those numbers, comparing engines is guesswork.

Split operational monitoring (real-time alerting) from product analytics (deep exploratory queries): one engine rarely carries both loops without a compromise.

Validate write/read SLOs on realistic datasets, not synthetic benchmarks only — synthetic data hides cardinality growth.

Need flexible SQL and joins with business data? Evaluate the PostgreSQL/columnar TSDB paths first, before standing up a separate system.

Plan downsampling and tiered storage early; this is the main lever for long-term TSDB cost control, and it is expensive to switch on after the fact.

Common TSDB selection mistakes

Covering both operational monitoring and deep BI analytics with one engine and no clear workload separation — both ends suffer.

Choosing by peak ingest numbers only and never costing the query and storage behavior that adds up later.

Leaving high-cardinality label risk untouched until the first production incident, when reworking the schema is already too late.

Postponing retention/downsampling strategy until infrastructure cost spikes, when the migration is the most expensive option.

Comparing engines apart from the operating model: self-hosted and managed cloud carry very different total cost of ownership.

References

Related chapters

Database Selection Framework - How to compare TSDB options across latency, retention horizon, cost profile, and operational maturity.
ClickHouse: analytical DBMS - When a columnar analytical engine can cover time-series workloads better than a monitoring-native TSDB path.
PostgreSQL: history and architecture - SQL-centric time-series path through relational tooling and its trade-offs between ingest and analytical flexibility.
Cassandra: architecture and trade-offs - Distributed storage backbone for high-ingest time-series systems: what you gain in scale and what you pay in consistency under an AP-oriented model.
Observability & Monitoring Design - How TSDB selection impacts metric pipelines, alerting loops, and SLO/SLI implementation in production.
Data Pipeline / ETL / ELT Architecture - Ingestion, storage-tiering, and downsampling pipeline patterns that shape long-term TSDB cost behavior.