Prometheus matters not only as a de facto standard, but as a very legible monitoring architecture where the strengths and the limits of the pull model are easy to see.
In real operations, this chapter helps you think about jobs, targets, service discovery, recording rules, and alerting as one connected system instead of a pile of unrelated YAML fragments.
In interviews and engineering discussions, it is especially useful when you need to explain why Prometheus is great for baseline cloud-native monitoring yet does not always solve long retention or global scale on its own.
Practical value of this chapter
Scrape topology
Design jobs, targets, and service discovery to prevent monitoring blind spots and duplicate time series.
Rules and alert pipeline
Separate recording and alerting rules to stabilize dashboard latency and improve alert quality.
Remote write boundary
Define local Prometheus versus long-term storage responsibilities by SLA and cost constraints.
Interview articulation
Explain pull-model trade-offs and when an additional aggregation layer becomes necessary.
Source
Prometheus docs
Official overview of Prometheus architecture: pull scraping, TSDB, PromQL, and rules.
Prometheus is a purpose-built monitoring time-series system that combines pull-based metric collection, its own TSDB engine, and PromQL query semantics. In the TSDB map it represents the canonical choice for infrastructure monitoring and SLO-driven operations.
History: key milestones
Born at SoundCloud
Prometheus started as an internal monitoring engine for cloud-native workloads.
Open source and early ecosystem
The project became open source and quickly gained adoption in Kubernetes environments.
CNCF incubating stage
Prometheus joined CNCF and established a neutral governance model.
CNCF graduated
It reached graduated status and became an industry standard for infrastructure monitoring.
Prometheus 2.x as the production baseline
Remote write/read patterns, operator workflows, and cardinality tuning practices matured.
Evolution toward scalable TSDB profiles
Long-term storage, federation, and hybrid Prometheus topologies became standard patterns.
Prometheus specifics
Pull-based collection
Prometheus scrapes targets itself, which simplifies topology control and endpoint health management.
TSDB with WAL + blocks
Metrics follow WAL -> head -> compacted blocks, giving a predictable storage lifecycle.
PromQL as the query language
PromQL is optimized for time-series vectors, label-based aggregation, and time-window analysis.
Rule-driven alerting
Recording/alerting rules plus Alertmanager integration create a controlled incident response loop.
Prometheus architecture by layers
At a high level, Prometheus can be read as a pipeline: ingest -> TSDB head/WAL -> block storage -> PromQL/query engine -> rules/alerts -> external integrations.
Key features
Prometheus is optimized for monitoring workloads: pull ingest, WAL/block-based TSDB, PromQL, and rule-driven alerting.
Pull model
Label model
Data lifecycle
DDL vs DML: Prometheus model
Prometheus does not implement SQL DDL/DML literally. For system-design analysis, it is useful to separate DDL-like operations (scrape/rule topology updates) from DML-like operations (metric sample flow and PromQL read execution).
How the DDL/DML model works in Prometheus
DDL-like: scrape/rule topology updates. DML-like: sample flow and PromQL reads.
1. Scrape / ingest
Samples + queriesScraper or remote write ingest receives new metric samples.
2. WAL append
Samples + queriesSamples are appended to WAL for durability before deeper processing.
3. Head update
Samples + queriesTSDB head updates series state and label index for fresh data.
4. PromQL execution
Samples + queriesQuery engine reads head and historical blocks, then aggregates results.
5. Compaction + retention
Samples + queriesBackground compaction merges blocks and retention removes expired data.
Active step
1. Scrape / ingest
Scraper or remote write ingest receives new metric samples.
Data and query path
- The DML-like path covers ingest, storage, and PromQL read execution.
- Fresh data lives in head, historical data in compacted blocks.
- Label cardinality has a direct impact on cost and query latency.
Source
InfluxDB docs
Reference context for an alternative TSDB profile.
Prometheus vs InfluxDB
Core approach
Prometheus: Pull-scrape model with tight integration into monitoring workflows.
InfluxDB: Strong focus on ingest APIs and time-series storage for broad telemetry scenarios.
Query language
Prometheus: PromQL focused on metrics, labels, and alert-oriented analysis.
InfluxDB: InfluxQL/Flux depending on version and data-processing profile.
Typical production profile
Prometheus: Infrastructure monitoring, SLOs, alerting, and Kubernetes observability.
InfluxDB: Monitoring + IoT + telemetry where flexible ingest and retention policies are key.
Operating model
Prometheus: Often combined with federation/remote storage for long-term retention.
InfluxDB: Often deployed as a standalone TSDB layer or as a managed service.
Why Prometheus is often chosen for monitoring
Practical interpretation for system-design workloads:
- Prometheus became the cloud-native monitoring standard due to its simple pull model and strong Kubernetes ecosystem fit.
- PromQL and rules create one unified path for observability and alerting without a separate DSL.
- The WAL -> head -> blocks lifecycle makes write/read behavior predictable in production.
- Integration with Alertmanager, Grafana, and remote storage supports growth from single-node to scalable topologies.
PromQL query examples
A compact cheat sheet for common production tasks: load, latency, error control, and SLO monitoring.
Throughput (RPS) by service
Estimate incoming traffic for `checkout-api` over the last 5 minutes.
sum(rate(http_requests_total{service="checkout-api"}[5m]))A baseline signal for incoming traffic speed, usually one of the first RED dashboard panels.
P95 latency
Track latency degradation in the user-facing request path.
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{service="checkout-api"}[5m])) by (le))For p95/p99, use histogram `*_bucket` data instead of averages to capture tail behavior.
Error rate (%)
Measure the share of 5xx responses relative to total service traffic.
100 * (sum(rate(http_requests_total{service="checkout-api",status=~"5.."}[5m])) / sum(rate(http_requests_total{service="checkout-api"}[5m])))A common base metric for SLO-aligned alerting and burn-rate rules.
CPU saturation by pod
Find pods that are approaching CPU limits.
sum(rate(container_cpu_usage_seconds_total{namespace="prod",pod=~"checkout-api-.*"}[5m])) by (pod)Useful during autoscaling tuning and hotspot diagnosis per pod replica.
Top-k by memory usage
Quickly isolate the heaviest pods by working set memory.
topk(5, container_memory_working_set_bytes{namespace="prod",pod=~"checkout-api-.*"})Helpful for OOMKill investigations and requests/limits right-sizing.
Error budget burn rate
Approximate error-budget spending speed for a 99.9% SLO.
(sum(rate(http_requests_total{service="checkout-api",status=~"5.."}[5m])) / sum(rate(http_requests_total{service="checkout-api"}[5m]))) / (1 - 0.999)Values significantly above `1` indicate the service is burning budget faster than allowed.
References
Related chapters
- Time Series Databases (TSDB): types, trade-offs, and selection - How to position Prometheus against alternative TSDB families across latency, retention, and operating profile.
- Database Selection Framework - Selection framework that helps justify Prometheus as a monitoring-focused TSDB rather than a universal analytical datastore.
- Observability & Monitoring Design - How Prometheus metrics connect with logs, traces, and SLO/SLI loops in production observability architecture.
- Prometheus: The Documentary - Historical context for Prometheus evolution and why its ecosystem became central in cloud-native monitoring.
- Kubernetes Fundamentals - Foundational context for service discovery, pull scraping, and operator-based rollout patterns in Kubernetes.
- Service Discovery - Why target discovery quality is critical for stable and complete metrics collection pipelines.
- VictoriaMetrics: history and architecture - Comparison of Prometheus-compatible storage and scaling strategies for long-term retention workloads.
