System Design Space
Knowledge graphSettings

Updated: March 2, 2026 at 7:56 AM

ClickHouse: analytical DBMS and architecture

mid

Column-oriented OLAP DBMS: MergeTree, partitioning, replication, materialized views and high-throughput analytics scenarios.

Official documentation

ClickHouse Docs

Basic concepts: table engines, architecture, SQL and operational practices.

Open docs

ClickHouse — columnar OLAP DBMS for analytics on large volumes. Its strengths are fast aggregating queries based on events and logs with high throughput recording and effective data compression.

History: key milestones

2009

Internal launch in Yandex

ClickHouse appears as a columnar analytical database for high-load reports.

2016

Open source release

The project becomes public and begins to develop in the open-source ecosystem.

2021

Spin-off of ClickHouse, Inc.

The team is launching a separate company around the project, the focus is shifting to the commercial ecosystem and cloud direction.

2021+

Cloud and ecosystem

Development of managed offerings, object storage integrations and an ecosystem of tools.

Key Architecture Principles

Column-oriented storage

Reading only the necessary columns dramatically reduces the I/O for analytical queries.

MergeTree family

Partitions, sorting, background merge processes and replication are the core of most production installations.

Vectorized execution

Requests are executed in batches, which increases throughput and CPU cache efficiency.

Compression + skipping

Columnar compression and data skipping indexes speed up scan-heavy workloads.

High-Level Architecture

At the high-level level, ClickHouse separates the client layer, coordinator layer and storage/background layer MergeTree. Writing and reading go through coordinators, and actual storage and calculations occur on shard replicas.

Sharded + Replicated

Canonical production profile: shard-key distribution, Keeper-based replication, and parallel scans.

Pros

  • Horizontal scaling as data volume grows.
  • Higher read throughput and better fault tolerance.
  • Flexible balancing of write/read paths across nodes.

Limitations

  • Higher operational complexity.
  • Requires lag, merge backlog, and shard skew control.
Best for: Large BI and product analytics with high throughput.

Workload Queue

CH-REQ-201
read
BI
dashboard_last_15m
CH-REQ-202
write
Ingest
insert_events_batch
CH-REQ-203
read
ML
feature_slice_by_tenant
CH-REQ-204
write
Ingest
insert_clickstream_batch

Control Plane

Requests are distributed across shards, and replicas provide HA and read scalability.

Ready to simulate ClickHouse architecture.

Last decision

Active step: idle

Shard A / R1

lag 1

primary replica

parts: 36 | reads: 0 | writes: 0

Shard A / R2

lag 2

secondary replica

parts: 35 | reads: 0 | writes: 0

Shard B / R1

lag 1

primary replica

parts: 34 | reads: 0 | writes: 0

Shard B / R2

lag 2

secondary replica

parts: 33 | reads: 0 | writes: 0

Replication & Merges

Keeper ops: 0

Replication and merge processes keep parts and lag within a controlled range.

Cluster Counters

reads: 0 | writes: 0 | parts: 138 | Avg lag: 1.5

Monitor the balance between ingestion throughput, merge backlog, and query latency.

Architecture Checklist

Partition + ORDER BY design
Replication lag SLO
Merge / TTL observability

Write/Read Path via Components

Below is an interactive diagram showing the passage of requests through key components: coordinator, shard replicas, Keeper and background processes.

Read/Write Path Explorer

Interactive walkthrough of how ClickHouse queries move through coordinator, shard replicas, Keeper, and background processes.

1
Client
INSERT batch
2
Insert Coordinator
route by shard key
3
Replicated MergeTree
create new parts
4
ClickHouse Keeper
replication metadata
5
Background Merges
compact + optimize
Write path: INSERT is routed through coordinator to shard replicas, creates new parts, and is optimized by background merge processes.

Write path

  1. Client sends INSERT in batches (usually via HTTP/native protocol).
  2. Coordinator routes each batch by shard key to the target replicas.
  3. Data is written as new parts inside MergeTree tables and synchronized via Keeper metadata.
  4. Background merges compact parts, apply TTL/mutations, and optimize storage layout.

Data Modeling Practice

  • Design for read and aggregation patterns first, not OLTP normalization.
  • Choose ORDER BY so that the key filters filter out the data as much as possible.
  • Use time/domain partitioning for managed retention and scan.
  • Use materialized views for pre-aggregation of hot analytical reports.
  • Plan TTL/retention in advance - volumes in analytics are growing quickly.

When to choose ClickHouse

Good fit

  • Product analytics, BI dashboards, observability/log analytics.
  • Event and time-series data with very high write throughput.
  • Complex aggregating queries on large historical volumes.
  • Near-real-time data marts for product and analytics teams.

Worth avoiding

  • OLTP scenarios with frequent point-updates and short transactions.
  • Workloads where row-level lock and strict transactional semantics are critical.
  • Systems with many small UPDATE/DELETE in real time.
  • Use cases, where the key task is online serving one record at a time.

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov