Ad Click Event Aggregator — System Design Space

Click aggregation is a streaming system where volume is enormous, events arrive out of order, and accounting mistakes turn directly into financial and analytical risk.

The chapter breaks down ingest, deduplication, window aggregation, late-event handling, historical recomputation, and serving near-realtime metrics.

For interviews and engineering discussions, this case is useful because it quickly brings the conversation to the trade-off between throughput, correctness, and trust in billing outputs.

Deduplication

If one click is counted twice, the mistake leaks into both the dashboard and the customer bill, so deduplication has to live on the main processing path.

Window Aggregation

Minute and hourly windows matter not only for speed, but also for a clear model of delay, window completeness, and later recomputation.

Historical Recompute

Late events, schema corrections, and incidents are inevitable, which is why safe historical recomputation should be designed up front rather than bolted on later.

Metric Freshness

Users care not only about the numbers themselves, but also about how current they are and where the boundary sits between the realtime view and billing truth.

Ad Click Event Aggregator is a streaming-system case where you have to balance metric freshness, billing correctness, and stable ingest under campaign bursts. In interviews, it quickly reveals whether you can explain where realtime numbers end and finance-grade truth begins.

Source

Acing the System Design Interview

Chapter 11 focuses on deduplication, window aggregation, and safe historical recomputation.

Читать обзор

Where this pattern shows up

Google Ads / Meta Ads: near-realtime dashboards plus a separate financial reconciliation loop.
DSP / RTB platforms: click, impression, and auction-event ingest under bursty campaign traffic.
Affiliate networks: conversion deduplication and fraud-control checks.
Internal growth platforms: one event backbone for product and marketing analytics.

Functional requirements

At the functional level, the system needs reliable accounting for clicks, impressions, and conversions without double counting. That immediately brings deduplication, idempotent ingest, and clear rules for turning raw events into product and billing metrics.

APIs and contracts

POST /events - ingest click, impression, and conversion events
GET /metrics - aggregates by window, campaign, and filters
POST /reconcile - trigger historical recompute and reconciliation
GET /quality - lag, duplicate rate, completeness, and error budget

Platform capabilities

Reliable event accounting without double charging in billing
Minute, hourly, and daily aggregates for different read patterns
Separation between the realtime view and the batch reconciliation path
Safe historical recomputation from immutable raw storage

Non-functional requirements

This system has to survive uneven advertising traffic, hold high ingest throughput, and keep almost no tolerance for billing errors. Peak numbers are only half the problem. The other half is an explicit contract: how fresh the dashboard has to be, how much billing drift is still acceptable, and how fast the team can recompute history after a failure. Without that contract, the first mismatch turns into an argument with no reference point.

Requirement	Target	Why it matters
Peak ingest throughput	Up to 1M events/s	Bursts during large campaign launches
Data freshness (p95)	< 30 sec	Operational dashboards for product and ad-ops teams
Billing drift	< 0.1%	Financial correctness and auditability
Availability	99.95%	Critical analytics path for the advertising business
Historical recompute RTO	< 30 min	Fast recovery after failure or data correction

High-Level Architecture

Theory

Streaming Data

Windows, watermarks, late events, historical recomputation, and the realtime versus batch boundary.

Читать обзор

High-Level Architecture

event ingest + window aggregation + reconciliation

This topology combines ingest flow, window aggregation, and a reconciliation/backfill control loop for billing correctness.

Ingress plane

Event Sources

SDKs and trackers

Ingest API

validate and enrich

Event Bus

Kafka / PubSub

Realtime processing plane

Deduplication Layer

idempotency key

Window Aggregator

minute / hour / day

Fast Aggregate Store

ClickHouse / Pinot

Dashboard API

queries and filters

Reliability plane

Raw Event Lake

immutable log

Historical Backfill Job

replay over history

Reconciliation Job

online vs billing

Event Sources

SDKs and trackers

Ingest API

validate and enrich

Event Bus

Kafka / PubSub

Deduplication Layer

idempotency key

Window Aggregator

minute / hour / day

Fast Aggregate Store

ClickHouse / Pinot

Dashboard API

queries and filters

Raw Event Lake

immutable log

Historical Backfill Job

replay over history

Reconciliation Job

online vs billing

The architecture separates ingest, realtime aggregation, and a dedicated correctness path. That keeps dashboard latency predictable while leaving billing-critical outputs on a path that can be verified and recomputed independently.

Write and Read Paths

On the write path, the system accepts an event, removes duplicates, and updates aggregate views. On the read path, it serves from pre-aggregated tables and cache instead of touching raw events unless absolutely necessary.

Write and Read Paths

How events become aggregates and how dashboards read those metrics under load.

Write path: the system accepts an event, removes duplicates, updates windowed aggregates, and writes them into the serving view.

Event Sources

Step 1

SDKs, trackers, and pixels

Clicks, impressions, and conversions are sent to ingest endpoints.

Ingest API

Step 2

validate and enrich

Schema validation, enrichment, and idempotency key generation.

Stream and Deduplication

Step 3

Kafka / PubSub + state

The stream processor removes duplicates and handles ordering and late events.

Window Aggregator

Step 4

minute / hour / day

Windowed aggregates are computed and written into the serving layer.

Serving Store

Step 5

ClickHouse/Pinot

Aggregate storage optimized for fast analytical reads.

Write-path checkpoints

•Ingress idempotency protects billing from double counting.
•Window aggregation builds minute/hour/day views while handling late events.
•Immutable raw storage remains the source of truth for replay and reconciliation.

Data Correctness and Reliability

Deeper

Event-Driven Architecture

Event contracts, replay-safe processing, and controlled evolution of an analytics pipeline.

Читать обзор

A fast dashboard does not make the system correct on its own. You still need window aggregation, watermarks, late-event handling, safe historical backfill, and explicit reconciliation between the realtime view and the billing path.

Dual truth model

One output cannot be both fast and financially accurate at once: the dashboard needs speed, the billing path needs the right to recompute history. So you split them — one answers the user immediately, the other reconciles and corrects mismatches on a schedule.

online_metrics  = stream_aggregate(raw_events)
billing_metrics = batch_reconcile(raw_events)

Realtime path minimizes delay for dashboards
Batch path maximizes accuracy for finance-facing reports
Reconciliation records mismatches and corrective adjustments explicitly

Data-quality controls

Without clear data contracts and lineage, the team cannot explain where a number came from after replay or historical recomputation.

Event key: event_id plus the dedupe layer protect against double counting.
Grace window: the system makes an explicit choice about how long late events are accepted.
Schema versioning: format changes go through strict compatibility rules.
Raw layer: immutable storage makes historical recompute safe and auditable.

Risks and Common Mistakes

In practice, these systems are rarely broken by a total outage first. They are more often damaged by slow drift between raw data, aggregate views, and billing numbers, plus skew that turns one campaign or key into a hot partition.

Double counting: weak deduplication breaks both billing and trust in the numbers.
Bad watermark settings: windows close too early or too late, which distorts aggregates.
Key skew: one ad_id or campaign_id can overload a single partition long before the whole cluster is saturated.
Poor traceability: without lineage, it becomes hard to explain metric mismatches during audit.
Silent quality degradation: late events, schema issues, and dropped events stay hidden without explicit guardrails.

What to emphasize in an interview

How deduplication works and why that particular event key was chosen.
Where the boundary sits between near-realtime analytics and finance-grade reporting.
How late events, large historical recomputes, and recovery after failure are handled.
Which SLOs and operational metrics matter: lag, freshness, duplicate rate, window completeness, and billing drift.

References

Apache Kafka — Design: commit log, partitions, delivery semantics, and consumer groups (Kafka Docs)Apache Flink — Timely Stream Processing: event time, watermarks, windows, and late events (Flink Docs)Tyler Akidau — Streaming 101: The world beyond batch (O'Reilly Radar, 2015)

Related chapters

Event-Driven Architecture - Event contracts, stream choreography, and integration patterns for analytics pipelines.
Streaming Data - Foundations of windows, watermarks, late events, and safe historical recomputation.
Kafka - Practical breakdown of the partitioned-log and consumer-group model behind high-ingest systems.
ClickHouse overview - Where to keep aggregates so the dashboard stays fast and ad-hoc queries do not blow up the storage bill.
A/B Testing platform - Adjacent experimentation case where event quality directly affects statistical validity.
Payment System - Useful comparison for idempotency, correctness boundaries, and regular reconciliation loops.