Click aggregation is a streaming system where volume is enormous, events arrive out of order, and accounting mistakes turn directly into financial and analytical risk.
The chapter breaks down ingest, deduplication, window aggregation, late-event handling, historical recomputation, and serving near-realtime metrics.
For interviews and engineering discussions, this case is useful because it quickly brings the conversation to the trade-off between throughput, correctness, and trust in billing outputs.
Deduplication
If one click is counted twice, the mistake leaks into both the dashboard and the customer bill, so deduplication has to live on the main processing path.
Window Aggregation
Minute and hourly windows matter not only for speed, but also for a clear model of delay, window completeness, and later recomputation.
Historical Recompute
Late events, schema corrections, and incidents are inevitable, which is why safe historical recomputation should be designed up front rather than bolted on later.
Metric Freshness
Users care not only about the numbers themselves, but also about how current they are and where the boundary sits between the realtime view and billing truth.
Ad Click Event Aggregator is a streaming-system case where you have to balance metric freshness, billing correctness, and stable ingest under campaign bursts. In interviews, it quickly reveals whether you can explain where realtime numbers end and finance-grade truth begins.
Source
Acing the System Design Interview
Chapter 11 focuses on deduplication, window aggregation, and safe historical recomputation.
Where this pattern shows up
- Google Ads / Meta Ads: near-realtime dashboards plus a separate financial reconciliation loop.
- DSP / RTB platforms: click, impression, and auction-event ingest under bursty campaign traffic.
- Affiliate networks: conversion deduplication and fraud-control checks.
- Internal growth platforms: one event backbone for product and marketing analytics.
Functional requirements
At the functional level, the system needs reliable accounting for clicks, impressions, and conversions without double counting. That immediately brings deduplication, idempotent ingest, and clear rules for turning raw events into product and billing metrics.
APIs and contracts
POST /events- ingest click, impression, and conversion eventsGET /metrics- aggregates by window, campaign, and filtersPOST /reconcile- trigger historical recompute and reconciliationGET /quality- lag, duplicate rate, completeness, and error budget
Platform capabilities
- Reliable event accounting without double charging in billing
- Minute, hourly, and daily aggregates for different read patterns
- Separation between the realtime view and the batch reconciliation path
- Safe historical recomputation from immutable raw storage
Non-functional requirements
This system has to survive uneven advertising traffic, keep dashboards fresh enough to be useful, and maintain almost no tolerance for billing errors. That means the design has to make target throughput, freshness, accuracy, and recovery time explicit rather than leaving them implied.
| Requirement | Target | Why it matters |
|---|---|---|
| Peak ingest throughput | Up to 1M events/s | Bursts during large campaign launches |
| Data freshness (p95) | < 30 sec | Operational dashboards for product and ad-ops teams |
| Billing drift | < 0.1% | Financial correctness and auditability |
| Availability | 99.95% | Critical analytics path for the advertising business |
| Historical recompute RTO | < 30 min | Fast recovery after failure or data correction |
High-Level Architecture
Theory
Streaming Data
Windows, watermarks, late events, historical recomputation, and the realtime versus batch boundary.
High-Level Architecture
event ingest + window aggregation + reconciliationThis topology combines ingest flow, window aggregation, and a reconciliation/backfill control loop for billing correctness.
The architecture separates ingest, realtime aggregation, and a dedicated correctness path. That keeps dashboard latency predictable while leaving billing-critical outputs on a path that can be verified and recomputed independently.
Write and Read Paths
On the write path, the system accepts an event, removes duplicates, and updates aggregate views. On the read path, it serves from pre-aggregated tables and cache instead of touching raw events unless absolutely necessary.
Write and Read Paths
How events become aggregates and how dashboards read those metrics under load.
Write path: the system accepts an event, removes duplicates, updates windowed aggregates, and writes them into the serving view.
Event Sources
Step 1SDKs, trackers, and pixels
Clicks, impressions, and conversions are sent to ingest endpoints.
Ingest API
Step 2validate and enrich
Schema validation, enrichment, and idempotency key generation.
Stream and Deduplication
Step 3Kafka / PubSub + state
The stream processor removes duplicates and handles ordering and late events.
Window Aggregator
Step 4minute / hour / day
Windowed aggregates are computed and written into the serving layer.
Serving Store
Step 5ClickHouse/Pinot
Aggregate storage optimized for fast analytical reads.
Write-path checkpoints
- •Ingress idempotency protects billing from double counting.
- •Window aggregation builds minute/hour/day views while handling late events.
- •Immutable raw storage remains the source of truth for replay and reconciliation.
Data Correctness and Reliability
Deeper
Event-Driven Architecture
Event contracts, replay-safe processing, and controlled evolution of an analytics pipeline.
A fast dashboard does not make the system correct on its own. You still need window aggregation, watermarks, late-event handling, safe historical backfill, and explicit reconciliation between the realtime view and the billing path.
Dual truth model
In practice, resilient analytics usually keeps two independent outputs: one optimized for fast product visibility and one optimized for financial accuracy and historical correction.
online_metrics = stream_aggregate(raw_events) billing_metrics = batch_reconcile(raw_events)
- Realtime path minimizes delay for dashboards
- Batch path maximizes accuracy for finance-facing reports
- Reconciliation records mismatches and corrective adjustments explicitly
Data-quality controls
Without clear data contracts and lineage, the team cannot explain where a number came from after replay or historical recomputation.
- Event key: event_id plus the dedupe layer protect against double counting.
- Grace window: the system makes an explicit choice about how long late events are accepted.
- Schema versioning: format changes go through strict compatibility rules.
- Raw layer: immutable storage makes historical recompute safe and auditable.
Risks and Common Mistakes
In practice, these systems are rarely broken by a total outage first. They are more often damaged by slow drift between raw data, aggregate views, and billing numbers, plus skew that turns one campaign or key into a hot partition.
- Double counting: weak deduplication breaks both billing and trust in the numbers.
- Bad watermark settings: windows close too early or too late, which distorts aggregates.
- Key skew: one ad_id or campaign_id can overload a single partition long before the whole cluster is saturated.
- Poor traceability: without lineage, it becomes hard to explain metric mismatches during audit.
- Silent quality degradation: late events, schema issues, and dropped events stay hidden without explicit guardrails.
What to emphasize in an interview
- How deduplication works and why that particular event key was chosen.
- Where the boundary sits between near-realtime analytics and finance-grade reporting.
- How late events, large historical recomputes, and recovery after failure are handled.
- Which SLOs and operational metrics matter: lag, freshness, duplicate rate, window completeness, and billing drift.
Related chapters
- Event-Driven Architecture - Event contracts, stream choreography, and integration patterns for analytics pipelines.
- Streaming Data - Foundations of windows, watermarks, late events, and safe historical recomputation.
- Kafka - Practical breakdown of the partitioned-log and consumer-group model behind high-ingest systems.
- ClickHouse overview - OLAP serving layer for aggregate views, ad-hoc analysis, and storage-cost control.
- A/B Testing platform - Adjacent experimentation case where event quality directly affects statistical validity.
- Payment System - Useful comparison for idempotency, correctness boundaries, and regular reconciliation loops.
