System Design Space
Knowledge graphSettings

Updated: March 2, 2026 at 6:20 PM

Ad Click Event Aggregator

mid

Classic task: stream ingestion, dedupe, windowed aggregations, freshness SLA and billing accuracy.

Acing SDI

Practice task from chapter 11

Ad click event aggregator: dedupe, windowing, and consistent analytics outputs.

Читать обзор

Ad Click Event Aggregator tests your ability to design a streaming system where speed, correctness, and metric explainability all matter at once. It is a common interview case at the boundary of data platform and product analytics.

Functional requirements

  • Ingest ad click/impression/conversion events.
  • Deduplicate events for billing correctness.
  • Build minute/hour/day window aggregates.
  • Serve realtime dashboards and batch reports.

Non-functional requirements

  • Stable operation under campaign traffic bursts.
  • Bounded latency for near-realtime analytics.
  • Clear data freshness and lineage visibility.
  • Controlled storage and recomputation costs.

High-Level Architecture

Theory

Streaming Data

Windowing, watermarks, late events, reprocessing, and realtime/batch trade-offs.

Читать обзор

High-Level Architecture

stream ingest + window aggregation + reconciliation

This topology combines ingest flow, window aggregation, and a reconciliation/backfill control loop for billing correctness.

Event Sources
SDK/trackers
Collector API
validate + enrich
Event Bus
Kafka/PubSub
Dedupe/Normalize
idempotency key
Window Aggregator
minute/hour/day
Hot Aggregate Store
ClickHouse/Pinot
Dashboard API
query/filters
Raw Event Lake
immutable log
Batch Backfill Job
historical replay
Reconciliation Job
online vs billing

The architecture separates ingest, realtime serving, and reliability control loops with batch reconciliation. This keeps dashboard latency predictable while preserving billing correctness.

Write/Read Paths

Write/Read Paths

How events are written into aggregates and how dashboards read metrics under load.

Write path: ingest accepts events, runs deduplication/windowing, and updates serving aggregates for near-realtime analytics.

Event Sources

SDK / trackers / pixels

Clicks, impressions, and conversions are sent to ingest endpoints.

Collector API

validate + enrich

Schema validation, enrichment, and idempotency key generation.

Stream + Dedupe

Kafka/PubSub + state

Stream processor applies dedupe, ordering, and late-event handling.

Window Aggregator

minute / hour / day

Windowed aggregates are computed and written into serving storage.

Serving Store

ClickHouse/Pinot

Aggregate storage optimized for fast analytical reads.

Write path checkpoints

  • Ingress idempotency protects billing from double counting.
  • Window aggregation builds minute/hour/day views while handling late events.
  • Immutable raw storage remains the source of truth for replay and reconciliation.

Data and deduplication

  • Idempotency key such as ad_id + user_id + ts_bucket.
  • Late events handled via watermarks and grace periods.
  • Schema evolution with strict versioning and backward compatibility.
  • Aggregate correction through reprocessing over immutable raw data.

SLO and operational metrics

  • Data freshness (p95 end-to-end lag).
  • Duplicate rate and window completeness.
  • Reprocessing duration and backfill cost.
  • Mismatch between online dashboard and billing reports.

Questions to clarify in interview

  • Required billing precision: near-exact or acceptable tolerance.
  • Dashboard freshness SLA and what lag is considered critical.
  • Need for drill-down into raw events and retention duration.
  • Auditability and legal/compliance constraints for event history.

Related chapters

Related materials

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov