System Design Space
Knowledge graphSettings

Updated: May 3, 2026 at 9:25 PM

YDB: distributed SQL database and architecture

medium

Distributed SQL DBMS with automatic sharding, YDB tablets, ACID transactions, row and column tables, shared-nothing horizontal scaling, and the operational cost of cross-shard coordination.

YDB matters not only because of its origin inside Yandex, but because it combines distributed SQL with a platform view of multi-tenancy, scaling, and operations.

In real projects, this chapter helps you reason about tenant isolation, quotas, transaction scope, and batched writes as properties of the whole platform rather than local optimizations for one team.

In interviews and architecture discussions, it is most useful when you need to justify YDB through fault tolerance, noisy-neighbor control, and predictable behavior at scale.

Practical value of this chapter

Multi-tenant boundaries

Design tenant isolation and quota strategy so noisy neighbors do not violate platform-level SLA.

Transactional contours

Define transaction scope and batching model for distributed SQL execution behavior.

Storage/compute elasticity

Tie scaling strategy to workload profile and expected traffic spikes.

Interview rationale

Justify YDB through fault tolerance, multi-tenant governance, and operational predictability requirements.

Source

Wikipedia: YDB

YDB history from KiWi and KiKiMR to the public DBMS, release milestones, and general architecture context.

Open article

Official docs

YDB Docs: Architecture

Architecture overview: shared-nothing design, YDB tablets, automatic sharding, distributed transactions, and the storage layer.

Open docs

YDB (Yet another DataBase) is a distributed SQL database with ACID transactions, automatic sharding, and shared-nothing architecture. In system design, YDB is often chosen for high-load backend systems where strong consistency, key-based horizontal scale, and a unified platform for transactional plus analytical workloads are needed.

History and context

2010KiWi

Development starts in Yandex internal infrastructure

Yandex starts the KiWi distributed key-value layer as a foundation for services that need horizontal scale.

2012KiKiMR

Transition toward distributed SQL

KiKiMR evolves around tablets and the actor model, which later become part of YDB's core architecture.

2016internal rollout

Broad production usage inside Yandex

YDB-style architecture is adopted by high-load services across the Yandex ecosystem.

April 19, 2022v22.1.5

Public open-source release

YDB is released as an open-source distributed SQL DBMS with ACID guarantees and automatic sharding.

February 6, 2025v24.3.15.5

24.3 server branch stabilization

The 24.3 branch receives fixes and operational improvements for production clusters.

September 15, 2025v25.1.4.7

New 25.x minor line

25.x stream continues SQL capabilities, performance tuning, and operational feature evolution.

Core architecture elements

Tablets and auto-sharding

Table data is distributed across YDB tablets that can split and move automatically as load grows.

Serializable transactions

YDB provides ACID transactions with serializable isolation and optimistic concurrency control.

Row + column tables

The same platform supports row tables for OLTP and column tables for nearby analytical workloads.

Disaggregated compute/storage

Architecture supports separated compute and storage layers plus multi-AZ fault-tolerant deployments.

Data model and transaction contour

The interactive section below shows how YDB combines row and column tables, automatic sharding, indexes, and distributed transactions in one architecture.

YDB data model: tables, shards, and transactions

YDB combines a relational model with automatic sharding and distributed transactions for high-load systems.

Why YDB is more than a typical SQL database

  • Every table requires a primary key, and data is physically distributed across shard tablets.
  • Both row-oriented and column-oriented tables are available for OLTP and OLAP profiles.
  • Distributed transactions with serializable isolation and OCC are built in.
  • Topics, CDC, and asynchronous replication let teams build integrated data pipelines.

Row-oriented tables

Core table type for transactional workloads: primary key is mandatory and rows are sorted by key.

Key elements

PRIMARY KEYDataShardPoint readsRange scans by key

Typical use cases

  • User/account state
  • Orders/payments
  • Transactional APIs

Example

CREATE TABLE orders (
  tenant_id Uint64,
  order_id Uint64,
  status Utf8,
  amount Uint64,
  PRIMARY KEY (tenant_id, order_id)
);

YDB architecture by layer

The diagram shows client access, SQL and transaction processing, tablets and shards, distributed storage, and fault-tolerance mechanics.

Clients and access
gRPC + SDKNode discoveryYQL / SQLCLI + API
layer transition
Query and transaction layer
Parse + optimizeSerializable txOCCTransaction coordinator
layer transition
Tablets and shards
DataShard / ColumnShardSchemeShardHiveSplit and merge
layer transition
Distributed storage
PDisk / VDiskDSProxySynchronous replicationBLOB storage
layer transition
Fault tolerance and scale
Shared-nothing3-AZ topologyAutomatic balancingSeparated compute/storage
layer transition
Workload profiles
Row tablesColumn tablesTopics + CDCVector indexes

System view

Distributed SQLStrong consistencyTransactions + analytics

Data model

Mandatory primary keyRow and column enginesHierarchical namespace

Operational trade-offs

Cross-shard transaction costKey design affects localityCluster operations need discipline

Read and write paths through components

This unified diagram combines write and read paths and shows how requests move through discovery, transaction coordinator, shards, and replicated storage.

Read/Write Path Explorer

Interactive walkthrough of transaction and query flow through core YDB cluster components.

1
Client Tx
UPSERT UPDATE REPLACE
2
Discovery + Route
tablet map
3
Tx Coordination
serializable/OCC
4
Shard Apply
DataShard/ColumnShard
5
Replica Commit
DSProxy + storage
Write path: request is routed to target shards, coordinated as a transaction, and acknowledged after replicated storage commit.

Write path

  1. Primary key design determines whether a write stays single-shard or becomes distributed.
  2. YDB uses serializable isolation with optimistic concurrency control; conflicting transactions may fail and require retry.
  3. Cross-shard writes usually cost more latency/resources than single-shard writes.
  4. DDL and DML are not combined in one transaction; schema changes are separate idempotent operations.

When to choose YDB

Good fit

  • High-load transactional services that need strong consistency and automatic sharding.
  • Systems with continuous data and traffic growth where shared-nothing horizontal scale matters.
  • Use cases that need OLTP tables plus nearby analytics on column tables.
  • Teams ready to invest in key design and distributed SQL operational discipline.

Avoid when

  • Small single-node projects where a simple local DB is enough.
  • Workloads dominated by frequent cross-shard transactions without partition-aware key design.
  • Organizations that cannot support distributed cluster operations.
  • Cases where full-text search or pure OLAP dominates without a transactional core.

Practice: DDL and DML

Below are practical YDB DDL/DML examples: schema/index design, transactional upserts, and key-range query patterns.

DDL and DML examples in YDB

DDL defines schema/partitioning, while DML covers transactional writes and analytical reads.

In YDB, DDL operations (tables, indexes, partitioning) are handled separately from DML transactions and should be idempotent.

Create row table with auto partitioning

CREATE TABLE

Primary key is mandatory; auto partitioning helps scale with data and load growth.

CREATE TABLE orders (
  tenant_id Uint64,
  order_id Uint64,
  status Utf8,
  amount Uint64,
  created_at Timestamp,
  PRIMARY KEY (tenant_id, order_id)
)
WITH (
  AUTO_PARTITIONING_BY_SIZE = ENABLED,
  AUTO_PARTITIONING_MIN_PARTITIONS_COUNT = 8
);

Add global secondary index

ALTER TABLE ... ADD INDEX

Secondary index improves non-key access paths but increases write-side overhead.

ALTER TABLE orders
ADD INDEX idx_status GLOBAL ASYNC ON (status);

Create column table for analytics

CREATE TABLE ... STORE=COLUMN

For OLAP workloads, use column store and hash-based partitioning.

CREATE TABLE events_olap (
  ts Timestamp NOT NULL,
  tenant_id Uint64 NOT NULL,
  event_type Utf8,
  payload Json,
  PRIMARY KEY (ts, tenant_id)
)
PARTITION BY HASH(tenant_id)
WITH (STORE = COLUMN);

Related chapters

Enable tracking in Settings