Source
Qdrant
Official Qdrant website: vector database positioning, core capabilities, and deployment options.
Documentation
Qdrant Docs: Overview
Core concepts: collections, points, payload filters, distributed mode, consistency, and performance tuning.
Qdrant is a vector database and similarity search engine for AI/ML systems. In system design, it is typically deployed as a dedicated retrieval layer next to an OLTP source of truth: embedding storage, payload filtering, hybrid search, and low-latency top-k response for RAG/search workflows.
History and context
Distributed mode arrives
Qdrant introduces distributed cluster mode with shard/replica topology for production workloads.
API stabilization and production adoption
The API and SDK ecosystem matures, and Qdrant is increasingly used as a retrieval layer in AI/ML systems.
Sparse vectors and user-defined sharding
Sparse vectors and more flexible shard distribution controls expand hybrid retrieval and tenancy patterns.
Multivectors and advanced retrieval
Multivector support enables richer retrieval setups with multiple vectors per point.
ACORN and strict filtering improvements
Strict filtering in HNSW traversal is improved for payload-heavy retrieval pipelines.
Core architecture elements
Collections, points, and payload
The core model is collection-based: each point stores vectors plus payload attributes for filtering and business context.
Filterable ANN
Search combines ANN traversal (HNSW) with payload-aware filtering to keep semantic retrieval grounded in business constraints.
Durability and storage layout
WAL + segments provide durability, while on-disk/memmap/quantization controls help tune cost and memory footprint.
Cluster mode
Shards and replicas scale throughput; read/write consistency and ordering settings control trade-offs.
Vector data model and payload filtering
The interactive block below summarizes Qdrant data modeling: dense/sparse/multivector schemas, named vectors, payload filters, and storage controls affecting latency and cost.
Qdrant Data Model: more than "an embedding store"
Qdrant stores points with vectors and payload, supports dense/sparse/multivector schemas, and enables filter-aware retrieval.
Why Qdrant is not only ANN over a single vector
- A point can carry vectors plus structured payload for filters and business attributes.
- Dense and sparse representations can be combined in hybrid retrieval pipelines.
- Named vectors and multivectors allow multiple embedding spaces per object.
- Index and storage controls (on-disk, quantization) let teams tune latency vs cost.
Dense vectors
Standard ANN retrieval over fixed-size embeddings (for example, 768/1024 dimensions).
Key elements
Typical use cases
- Semantic search
- RAG retrieval
- Recommendations
Example
"vectors": { "size": 768, "distance": "Cosine" }High-Level Architecture
The diagram below shows a high-level Qdrant setup in a product system: API and ingestion layer, collections/shards, ANN + payload indexing, durability path, and cluster-level behavior.
System view
Qdrant is typically used as a dedicated vector retrieval layer for semantic search and RAG, while transactional source-of-truth data remains in OLTP storage.
Retrieval capabilities
RAG and filtering
Operational trade-offs
Read / Write Path through components
This unified diagram combines write and read paths: how Qdrant processes upsert/query requests, updates index structures, and returns top-k points in single-node and distributed deployments.
Read/Write Path Explorer
Interactive walkthrough of how vector operations move through Qdrant components.
Write path
- Client submits `upsert`/`set-payload` operations into a Qdrant collection.
- Write goes through WAL, then is materialized in segments and index structures.
- In distributed mode, mutations propagate to shard replicas according to consistency policy.
- `wait=true` and write consistency settings determine when the client receives ack.
When to choose Qdrant
Good fit
- Semantic search and RAG where embeddings must be retrieved with metadata filtering.
- Hybrid retrieval (dense + sparse) when lexical and semantic signals must be combined.
- Catalog/content systems with tenant/category/date constraints and low-latency retrieval.
- Production vector layer with replication, snapshot workflows, and explicit latency/recall tuning.
Avoid when
- Workloads dominated by relational joins and transactional OLTP logic.
- Pure OLAP analytics with large-scale aggregates over columnar datasets.
- Teams not ready to tune ANN parameters and validate retrieval quality (recall/precision).
- Use cases that require a general-purpose SQL engine instead of a specialized vector retrieval layer.
Practice: DDL and DML
Below are practical Qdrant API examples: DDL operations for collection/index lifecycle and DML operations for point upserts, vector queries, and payload updates.
DDL and DML examples in Qdrant
DDL controls collection/index structure, while DML manages points, payload, and vector queries.
DDL in Qdrant is about collection structure: vector schema, sharding/replication settings, and payload indexes.
Create collection for dense + sparse retrieval
PUT /collections/productsDefine vector schema, distributed parameters, and payload storage mode.
PUT /collections/products
{
"vectors": {
"size": 768,
"distance": "Cosine"
},
"sparse_vectors": {
"text": {}
},
"shard_number": 3,
"replication_factor": 2,
"write_consistency_factor": 1,
"on_disk_payload": true
}Create payload index for filtered search
PUT /collections/products/indexIndex category field to stabilize filter latency at scale.
PUT /collections/products/index
{
"field_name": "category",
"field_schema": "keyword"
}Tune HNSW and quantization
PATCH /collections/productsAdjust index and compression for recall/latency/cost profile.
PATCH /collections/products
{
"vectors": {
"": {
"hnsw_config": {
"m": 32,
"ef_construct": 256
},
"quantization_config": {
"scalar": {
"type": "int8",
"always_ram": true
}
}
}
}
}