Statement of the problem
A/B testing is one of the most important tools for decision-making in product companies. It is necessary to design a platform that allows experiments to be carried out on millions of users with minimal impact on the performance of the main product.
Functional Requirements
Experiment management
- Creating experiments with variations (Control/Treatment)
- Determining the target audience (targeting)
- Setting the experiment duration
- Defining Metrics to Measure
Distribution of options
- Random distribution of users among options
- Consistent assignment (one user = one option)
- Support traffic splitting (1%, 5%, 50%...)
- Progressive rollout (gradual increase in traffic)
Data collection
- Logging user actions
- Linking events to an experiment variant
- Aggregation of metrics (CTR, conversion, retention)
Analysis of results
- Calculation of statistical significance (p-value)
- Confidence intervals
- Visualization of results in real time
Non-functional requirements
Low latency
Variant detection should take <10ms to avoid impacting UX
Consistency
The user should see the same option throughout the experiment
Scalability
Process billions of events per day without performance degradation
High-level architecture
Main Components
Experiment Management Service
CRUD operations for experiments, targeting rules, variant configuration
Variant Assignment Service
Quickly determine the option for the user (critical path)
Event Ingestion Pipeline
Collection and processing of events linked to experiments
Analysis Engine
Statistical analysis, p-value calculation, confidence intervals
The architecture is built around two paths:
Hot Path — purpose of the option
Cold Path — data collection and analysis
Visualization via C4 Model
Below, the A/B platform system is decomposed into C4 levels: first the external context, then the platform containers, and finally the detailing of the critical variant assignment container. More details about the approach itself: head C4 Model.
L1 — System Context
Who interacts with the platform and what external systems are involved in the loop.
Randomization algorithms
A high-quality randomization algorithm should ensure: absence of bias, consistency and independence between experiments.
Hash and Partition (HP)
variant = Hash(UserID + ExperimentID) % 100if variant < 50: return "Control"else: return "Treatment"- Does not require state storage
- Deterministic - one input = one output
- Independence between experiments thanks to ExperimentID
- Easily scalable
Pseudorandom with Caching (PwC)
Generating a random number and then caching the result for the user.
- Server-side: database storage
- Client-side: storage in cookies
- Requires additional storage
- Potential consistency issues when clearing cookies
Methods for assigning options
Server-side Assignment
- ✓ More secure (logic hidden)
- ✓ Ability to test backend logic
- △ Requires fast service or embedded library
- ✗ Additional network hop
Client-side Assignment
- ✓ Faster for UI changes
- ✓ Lightweight SDK
- △ The configuration is loaded at startup
- ✗ Logic is visible to users
Optimization: Configuration Push
Experiment configurations are pushed to edge nodes or to the Redis cache to minimize latency. The SDK on the client receives the configuration via CDN and executes the hash-based assignment locally.
Data Pipeline
Client Events
Kafka
Flink/Spark
ClickHouse
Dashboard
Ingestion
Events are sent to Kafka for high-throughput processing. Each event contains user_id, experiment_id, variant, timestamp and payload.
Processing
Stream processing (Flink) for real-time metrics or batch (Spark) for complex aggregations and statistical analysis.
Storage & Reporting
OLAP database (ClickHouse, Pinot) for fast analytical queries. Dashboard with real-time update of results.
Parallel experiments (Layers)
How to run several experiments simultaneously without mutual influence? Solution - concept layers.
Problem
Experiment A tests the search algorithm, Experiment B tests the button color. If the user ends up in both Treatments, how can we understand what influenced the conversion?
Solution: Domains/Layers
Experiments are grouped by domain (UI, Backend, Algorithm). Within a domain - mutually exclusive, between domains - independent.
Layer Architecture Example
Common mistakes
Sample Ratio Mismatch (SRM)
A 50/50 distribution gives 52/48 in reality. Reasons: bot traffic, redirect issues, client-side bugs. Always check the SRM before analyzing the results.
Peeking Problem
Premature analysis of results before achieving statistical significance. Solution: sequential testing or fixed sample size.
Network Effects
In social products, users influence each other. Cluster-based randomization instead of user-based can help.
Multiple Testing
Analyzing multiple metrics increases the likelihood of false positives. Use Bonferroni correction or select primary metric.
Key Findings
Hash-based assignment — the preferred method for consistent and stateless distribution of options
Configuration push — configurations on edge nodes or in Redis for minimal latency
Layer architecture — isolation of experiments across domains for parallel running
Event streaming - Kafka + Flink/Spark for processing billions of events
Statistical rigor — SRM checks, proper sample size, sequential testing
OLAP for analytics — ClickHouse/Pinot for real-time dashboards and complex queries
The material was prepared based on a public interview «System Design Interview: A/B Testing Platform» and articles by Ron Kohavi “Trustworthy Online Controlled Experiments”
