System Design Space
Knowledge graphSettings

Updated: March 1, 2026 at 11:34 PM

Qdrant: vector database and architecture

mid

Vector database for semantic and hybrid retrieval: collections/points, payload filters, HNSW indexing, distributed mode, and consistency controls.

Source

Qdrant

Official Qdrant website: vector database positioning, core capabilities, and deployment options.

Open website

Documentation

Qdrant Docs: Overview

Core concepts: collections, points, payload filters, distributed mode, consistency, and performance tuning.

Open docs

Qdrant is a vector database and similarity search engine for AI/ML systems. In system design, it is typically deployed as a dedicated retrieval layer next to an OLTP source of truth: embedding storage, payload filtering, hybrid search, and low-latency top-k response for RAG/search workflows.

History and context

June 8, 2022v0.8.0

Distributed mode arrives

Qdrant introduces distributed cluster mode with shard/replica topology for production workloads.

February 8, 2023v1.0.0

API stabilization and production adoption

The API and SDK ecosystem matures, and Qdrant is increasingly used as a retrieval layer in AI/ML systems.

December 8, 2023v1.7.0

Sparse vectors and user-defined sharding

Sparse vectors and more flexible shard distribution controls expand hybrid retrieval and tenancy patterns.

July 1, 2024v1.10.0

Multivectors and advanced retrieval

Multivector support enables richer retrieval setups with multiple vectors per point.

November 17, 2025v1.16.0

ACORN and strict filtering improvements

Strict filtering in HNSW traversal is improved for payload-heavy retrieval pipelines.

Core architecture elements

Collections, points, and payload

The core model is collection-based: each point stores vectors plus payload attributes for filtering and business context.

Filterable ANN

Search combines ANN traversal (HNSW) with payload-aware filtering to keep semantic retrieval grounded in business constraints.

Durability and storage layout

WAL + segments provide durability, while on-disk/memmap/quantization controls help tune cost and memory footprint.

Cluster mode

Shards and replicas scale throughput; read/write consistency and ordering settings control trade-offs.

Vector data model and payload filtering

The interactive block below summarizes Qdrant data modeling: dense/sparse/multivector schemas, named vectors, payload filters, and storage controls affecting latency and cost.

Qdrant Data Model: more than "an embedding store"

Qdrant stores points with vectors and payload, supports dense/sparse/multivector schemas, and enables filter-aware retrieval.

Why Qdrant is not only ANN over a single vector

  • A point can carry vectors plus structured payload for filters and business attributes.
  • Dense and sparse representations can be combined in hybrid retrieval pipelines.
  • Named vectors and multivectors allow multiple embedding spaces per object.
  • Index and storage controls (on-disk, quantization) let teams tune latency vs cost.

Dense vectors

Standard ANN retrieval over fixed-size embeddings (for example, 768/1024 dimensions).

Key elements

Collection vectors configUpsert pointsQuery top-kDistance: Cosine/Dot/Euclid

Typical use cases

  • Semantic search
  • RAG retrieval
  • Recommendations

Example

"vectors": { "size": 768, "distance": "Cosine" }

High-Level Architecture

The diagram below shows a high-level Qdrant setup in a product system: API and ingestion layer, collections/shards, ANN + payload indexing, durability path, and cluster-level behavior.

Clients and API
HTTP + gRPCPython/JS/Rust SDKOpenAPIBatch upsert
Layer transition
Collections and sharding
CollectionsPointsShard routingReplication factor
Layer transition
Vector and payload indexing
HNSW (ANN)Sparse index (exact)Payload indexFilterable search
Layer transition
Storage internals
WALSegmentsMemmap / on-diskVersioned updates
Layer transition
Distributed consistency
Raft (topology)Read consistencywrite_consistency_factorWrite ordering
Layer transition
Operations
SnapshotsQuantizationOptimizerMonitoring

System view

Qdrant is typically used as a dedicated vector retrieval layer for semantic search and RAG, while transactional source-of-truth data remains in OLTP storage.

Retrieval capabilities

Dense vectorsSparse vectorsMultivectors + named vectors

RAG and filtering

Metadata filtersHybrid query patternsPayload-aware ranking

Operational trade-offs

Recall vs latency tuningReplica write overheadStorage/memory balancing

Read / Write Path through components

This unified diagram combines write and read paths: how Qdrant processes upsert/query requests, updates index structures, and returns top-k points in single-node and distributed deployments.

Read/Write Path Explorer

Interactive walkthrough of how vector operations move through Qdrant components.

1
Client Upsert
points wait=true
2
WAL
durability log
3
Segment Update
points + payload
4
Index Refresh
HNSW / payload idx
5
Replica Ack
consistency
Write path: upsert goes through WAL and segments, updates indexes, and is acknowledged according to replica/consistency settings.

Write path

  1. Client submits `upsert`/`set-payload` operations into a Qdrant collection.
  2. Write goes through WAL, then is materialized in segments and index structures.
  3. In distributed mode, mutations propagate to shard replicas according to consistency policy.
  4. `wait=true` and write consistency settings determine when the client receives ack.

When to choose Qdrant

Good fit

  • Semantic search and RAG where embeddings must be retrieved with metadata filtering.
  • Hybrid retrieval (dense + sparse) when lexical and semantic signals must be combined.
  • Catalog/content systems with tenant/category/date constraints and low-latency retrieval.
  • Production vector layer with replication, snapshot workflows, and explicit latency/recall tuning.

Avoid when

  • Workloads dominated by relational joins and transactional OLTP logic.
  • Pure OLAP analytics with large-scale aggregates over columnar datasets.
  • Teams not ready to tune ANN parameters and validate retrieval quality (recall/precision).
  • Use cases that require a general-purpose SQL engine instead of a specialized vector retrieval layer.

Practice: DDL and DML

Below are practical Qdrant API examples: DDL operations for collection/index lifecycle and DML operations for point upserts, vector queries, and payload updates.

DDL and DML examples in Qdrant

DDL controls collection/index structure, while DML manages points, payload, and vector queries.

DDL in Qdrant is about collection structure: vector schema, sharding/replication settings, and payload indexes.

Create collection for dense + sparse retrieval

PUT /collections/products

Define vector schema, distributed parameters, and payload storage mode.

PUT /collections/products
{
  "vectors": {
    "size": 768,
    "distance": "Cosine"
  },
  "sparse_vectors": {
    "text": {}
  },
  "shard_number": 3,
  "replication_factor": 2,
  "write_consistency_factor": 1,
  "on_disk_payload": true
}

Create payload index for filtered search

PUT /collections/products/index

Index category field to stabilize filter latency at scale.

PUT /collections/products/index
{
  "field_name": "category",
  "field_schema": "keyword"
}

Tune HNSW and quantization

PATCH /collections/products

Adjust index and compression for recall/latency/cost profile.

PATCH /collections/products
{
  "vectors": {
    "": {
      "hnsw_config": {
        "m": 32,
        "ef_construct": 256
      },
      "quantization_config": {
        "scalar": {
          "type": "int8",
          "always_ram": true
        }
      }
    }
  }
}

Related materials

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov