System Design Space
Knowledge graphSettings

Updated: May 4, 2026 at 3:52 PM

Qdrant: vector database and architecture

medium

Vector database for semantic and hybrid search: collections and points, HNSW, payload filters, shards, replicas, and RAG-oriented retrieval.

Qdrant is best understood not as a magical vector database, but as one layer in retrieval, where search quality depends on HNSW, embeddings, filters, payload, and index update strategy.

In real AI and search scenarios, this chapter helps design vector storage as part of a full retrieval stack with model versioning, reindexing, filtering, and explicit trade-offs between quality, latency, and cost.

In interviews and engineering discussions, it is especially useful when you need to explain why a vector database does not solve semantic retrieval on its own and how it fits into a broader RAG or search system.

Practical value of this chapter

Retrieval design

Treat vector retrieval as a dedicated layer where embeddings, filters, and reranking work coherently.

Embedding lifecycle

Plan embedding model versioning and reindexing flows without downtime for search users.

Hybrid quality controls

Combine vector and keyword signals with quality metrics to keep precision/recall aligned to product goals.

Interview risk framing

Discuss quality, cost, and latency trade-offs explicitly in semantic search and RAG architecture decisions.

Source

Qdrant

Official Qdrant website: vector database positioning, core capabilities, and deployment options.

Open website

Documentation

Qdrant Docs: Overview

Core concepts: collections, points, payload filters, distributed mode, consistency, and performance tuning.

Open docs

Qdrant is a vector database and similarity search engine for AI/ML systems. In system design, it is typically deployed as a dedicated retrieval layer next to an OLTP source of truth: embedding storage, payload filtering, hybrid search, and low-latency nearest-neighbor results for RAG/search workflows.

History and context

2021Project

Qdrant emerges

The idea grew from the need to search similar unstructured objects; after evaluating existing libraries, the team started its own Rust-based vector search engine.

April 6, 2021v0.2.0

First database capabilities

An early release adds payload indexing for numeric and keyword fields, the first practical building block for filterable vector search.

June 8, 2022v0.8.0

Distributed mode arrives

After the early single-node phase, Qdrant introduces distributed cluster mode with shard/replica topology for production workloads.

February 8, 2023v1.0.0

API stabilization and production adoption

The API and SDK ecosystem matures, and Qdrant is increasingly used as a retrieval layer in AI/ML systems.

December 8, 2023v1.7.0

Sparse vectors and user-defined sharding

Sparse vectors and more flexible shard distribution controls expand hybrid retrieval and tenancy patterns.

July 1, 2024v1.10.0

Multivectors and advanced retrieval

Multivector support enables richer retrieval setups where one point can carry several vector representations.

November 17, 2025v1.16.0

ACORN and strict filtering improvements

Strict filtering during HNSW traversal improves retrieval quality for payload-heavy search workloads.

Core architecture elements

Collections, points, and payload

The core model is collection-based: each point stores vectors plus payload attributes used for filtering and business context.

Filterable ANN

Search combines HNSW-based nearest-neighbor lookup with payload-aware filtering to keep semantic retrieval grounded in business constraints.

Durability and storage layout

The WAL and segments protect writes, while on-disk storage, memory mapping, and quantization help tune cost and memory footprint.

Cluster mode

Shards and replicas scale throughput; consistency and write ordering settings control the speed-versus-safety trade-off.

Vector data model and filtering

The interactive block below summarizes Qdrant data modeling: dense vectors, sparse vectors, multivectors, named vectors, payload filters, and storage controls that affect latency and cost.

Qdrant Data Model: more than "an embedding store"

Qdrant stores points with vectors and payload, supports dense/sparse/multivector schemas, and enables filter-aware retrieval.

Why Qdrant is not only ANN over a single vector

  • A point can carry vectors plus structured payload for filters and business attributes.
  • Dense and sparse representations can be combined in hybrid retrieval pipelines.
  • Named vectors and multivectors allow multiple embedding spaces per object.
  • Index and storage controls (on-disk, quantization) let teams tune latency vs cost.

Dense vectors

Standard ANN retrieval over fixed-size embeddings (for example, 768/1024 dimensions).

Key elements

Collection vectors configUpsert pointsQuery top-kDistance: Cosine/Dot/Euclid

Typical use cases

  • Semantic search
  • RAG retrieval
  • Recommendations

Example

"vectors": { "size": 768, "distance": "Cosine" }

Qdrant architecture by layer

The diagram below shows a typical Qdrant setup in a product system: API, ingestion, collections and shards, vector indexes, filters, WAL, segments, and cluster operations.

Clients and API
HTTP + gRPCPython/JS/Rust SDKOpenAPIbatch upsert
layer transition
Collections and sharding
collectionspointsshard routingreplication factor
layer transition
Vector indexes
HNSWsparse indexpayload indexfilterable search
layer transition
Storage
WALsegmentsdisk / memmapversioned updates
layer transition
Cluster consistency
Raft topologyread consistencywrite factorwrite ordering
layer transition
Operations
snapshotsquantizationoptimizermonitoring

system view

Qdrant is typically used as a dedicated vector search layer next to a transactional source of truth.

Search capabilities

dense vectorssparse vectorsmultivectors and named vectors

RAG and filtering

attribute filtershybrid query patternspayload-aware ranking

Operational trade-offs

recall versus latencyreplica write overheadmemory and disk balance

Write and read paths through components

This unified diagram combines the write path and read path: how Qdrant processes updates and search requests, refreshes indexes, and returns the nearest results in single-node and distributed deployments.

Read/Write Path Explorer

Interactive walkthrough of how vector operations move through Qdrant components.

1
Client Upsert
points wait=true
2
WAL
durability log
3
Segment Update
points + payload
4
Index Refresh
HNSW / payload idx
5
Replica Ack
consistency
Write path: upsert goes through WAL and segments, updates indexes, and is acknowledged according to replica/consistency settings.

Write path

  1. Client submits `upsert`/`set-payload` operations into a Qdrant collection.
  2. Write goes through WAL, then is materialized in segments and index structures.
  3. In distributed mode, mutations propagate to shard replicas according to consistency policy.
  4. `wait=true` and write consistency settings determine when the client receives ack.

When to choose Qdrant

Good fit

  • Semantic search and RAG where embeddings must be retrieved with metadata filtering.
  • Hybrid retrieval (dense + sparse) when lexical and semantic signals must be combined.
  • Catalog/content systems with tenant/category/date constraints and low-latency retrieval.
  • Production vector retrieval layer with replication, snapshots, and explicit latency/recall tuning.

Avoid when

  • Workloads dominated by relational joins and transactional OLTP logic.
  • Pure OLAP analytics with large-scale aggregates over columnar datasets.
  • Teams not ready to tune HNSW parameters and validate retrieval quality with recall and precision metrics.
  • Use cases that require a general-purpose SQL engine instead of a specialized vector retrieval layer.

Practice: DDL and DML

Below are practical Qdrant API examples: structure-level operations for collections and indexes, plus commands for point updates, vector queries, and payload changes.

DDL and DML examples in Qdrant

DDL controls collection/index structure, while DML manages points, payload, and vector queries.

DDL in Qdrant is about collection structure: vector schema, sharding/replication settings, and payload indexes.

Create collection for dense + sparse retrieval

PUT /collections/products

Define vector schema, distributed parameters, and payload storage mode.

PUT /collections/products
{
  "vectors": {
    "size": 768,
    "distance": "Cosine"
  },
  "sparse_vectors": {
    "text": {}
  },
  "shard_number": 3,
  "replication_factor": 2,
  "write_consistency_factor": 1,
  "on_disk_payload": true
}

Create payload index for filtered search

PUT /collections/products/index

Index category field to stabilize filter latency at scale.

PUT /collections/products/index
{
  "field_name": "category",
  "field_schema": "keyword"
}

Tune HNSW and quantization

PATCH /collections/products

Adjust index and compression for recall/latency/cost profile.

PATCH /collections/products
{
  "vectors": {
    "": {
      "hnsw_config": {
        "m": 32,
        "ef_construct": 256
      },
      "quantization_config": {
        "scalar": {
          "type": "int8",
          "always_ram": true
        }
      }
    }
  }
}

References

Related chapters

Enable tracking in Settings