Source
Prometheus docs
Official overview of Prometheus architecture: pull scraping, TSDB, PromQL, and rules.
Prometheus is a purpose-built monitoring time-series system that combines pull-based metric collection, its own TSDB engine, and PromQL query semantics. In the TSDB map it represents the canonical choice for infrastructure monitoring and SLO-driven operations.
History: key milestones
Born at SoundCloud
Prometheus started as an internal monitoring engine for cloud-native workloads.
Open source and early ecosystem
The project became open source and quickly gained adoption in Kubernetes environments.
CNCF incubating stage
Prometheus joined CNCF and established a neutral governance model.
CNCF graduated
It reached graduated status and became an industry standard for infrastructure monitoring.
Prometheus 2.x as the production baseline
Remote write/read patterns, operator workflows, and cardinality tuning practices matured.
Evolution toward scalable TSDB profiles
Long-term storage, federation, and hybrid Prometheus topologies became standard patterns.
Prometheus specifics
Pull-based collection
Prometheus scrapes targets itself, which simplifies topology control and endpoint health management.
TSDB with WAL + blocks
Metrics follow WAL -> head -> compacted blocks, giving a predictable storage lifecycle.
PromQL as the query language
PromQL is optimized for time-series vectors, label-based aggregation, and time-window analysis.
Rule-driven alerting
Recording/alerting rules plus Alertmanager integration create a controlled incident response loop.
Prometheus architecture by layers
At a high level, Prometheus can be read as a pipeline: ingest -> TSDB head/WAL -> block storage -> PromQL/query engine -> rules/alerts -> external integrations.
Key features
Prometheus is optimized for monitoring workloads: pull ingest, WAL/block-based TSDB, PromQL, and rule-driven alerting.
Pull model
Label model
Data lifecycle
DDL vs DML: Prometheus model
Prometheus does not implement SQL DDL/DML literally. For system-design analysis, it is useful to separate DDL-like operations (scrape/rule topology updates) from DML-like operations (metric sample flow and PromQL read execution).
How the DDL/DML model works in Prometheus
DDL-like: scrape/rule topology updates. DML-like: sample flow and PromQL reads.
1. Scrape / ingest
Samples + queriesScraper or remote write ingest receives new metric samples.
2. WAL append
Samples + queriesSamples are appended to WAL for durability before deeper processing.
3. Head update
Samples + queriesTSDB head updates series state and label index for fresh data.
4. PromQL execution
Samples + queriesQuery engine reads head and historical blocks, then aggregates results.
5. Compaction + retention
Samples + queriesBackground compaction merges blocks and retention removes expired data.
Active step
1. Scrape / ingest
Scraper or remote write ingest receives new metric samples.
Data and query path
- The DML-like path covers ingest, storage, and PromQL read execution.
- Fresh data lives in head, historical data in compacted blocks.
- Label cardinality has a direct impact on cost and query latency.
Source
InfluxDB docs
Reference context for an alternative TSDB profile.
Prometheus vs InfluxDB
Core approach
Prometheus: Pull-scrape model with tight integration into monitoring workflows.
InfluxDB: Strong focus on ingest APIs and time-series storage for broad telemetry scenarios.
Query language
Prometheus: PromQL focused on metrics, labels, and alert-oriented analysis.
InfluxDB: InfluxQL/Flux depending on version and data-processing profile.
Typical production profile
Prometheus: Infrastructure monitoring, SLOs, alerting, and Kubernetes observability.
InfluxDB: Monitoring + IoT + telemetry where flexible ingest and retention policies are key.
Operating model
Prometheus: Often combined with federation/remote storage for long-term retention.
InfluxDB: Often deployed as a standalone TSDB layer or as a managed service.
Why Prometheus is often chosen for monitoring
Practical interpretation for system-design workloads:
- Prometheus became the cloud-native monitoring standard due to its simple pull model and strong Kubernetes ecosystem fit.
- PromQL and rules create one unified path for observability and alerting without a separate DSL.
- The WAL -> head -> blocks lifecycle makes write/read behavior predictable in production.
- Integration with Alertmanager, Grafana, and remote storage supports growth from single-node to scalable topologies.
