ML Ops Pipeline — System Design Space

An MLOps pipeline gets hard not inside one training job, but where data versions, model release, and runtime operations start moving at different speeds.

The chapter connects ingest, training, model registry, live inference, and drift signals into one engineering system that can be released and rolled back safely.

For interviews and engineering discussions, this case is especially useful for talking about reproducibility, release discipline, and team responsibility after launch.

Offline/Online Parity

Keep feature semantics consistent across training and serving paths.

Rollout Safety

Canary, shadow, rollback, and drift alerting are baseline architecture requirements.

Data Quality

Use guardrails for freshness, lineage, and training-serving skew prevention.

Platform Efficiency

Balance pipeline cost, feature-store footprint, and inference latency.

Related chapter

Machine Learning System Design

A framing toolkit for ML case studies: requirements, metrics, data, and operating risks.

Читать обзор

ML Ops Pipeline is not a case about one training job. It is about an engineering system that has to move a model from data to a stable release and survive the next change cycle. In interviews, the important part is showing that you can assemble the full lifecycle: data, training, model registry, serving, release control, and post-release response.

Scope boundaries

Covered in this chapter

From data intake and training to release, live inference, monitoring, and the next retraining cycle.
Release control: quality gates, limited-rollout scenarios, and rollback readiness.
Operating model: SLOs, stage ownership, incident instructions, and response to drift or quality issues.

Left out on purpose

Low-level feature-registry API design and schema evolution mechanics.
Detailed online feature-layer internals: key design, TTL strategy, hot-key mitigation, and cache implementation.
Deep mechanics of materialization jobs and conflict resolution between batch and stream updates.

Detailed runtime design of the feature store and serving contracts is covered in Feature Store & Model Serving.

Functional requirements

Build one continuous path from raw events and source data to live model inference.
Preserve reproducibility across training runs: dataset versions, feature definitions, and model artifacts must stay explicit.
Support controlled rollout with safe rollback rather than a one-shot deployment.
Close the feedback loop with online metrics, quality signals, drift monitoring, and retraining triggers.

Non-functional requirements

p95 latency of live inference below 150 ms for user-facing scenarios.
Feature freshness SLA of 1-5 minutes for critical behavioral signals.
99.95% availability of the inference path with graceful degradation through fallback and a rule-based baseline.
Full traceability across dataset lineage, feature versions, model versions, and release decisions.

Scale and assumptions

Parameter	Assumption	Why it matters
DAU	8M	The product sees a constant flow of user events and real-time personalization requests.
Peak inference QPS	120k	Traffic is spread across several surfaces, including feed, search, and recommendations.
Feature updates	1.5B/day	Signals have to be materialized quickly in the online layer or the model starts lagging behind product reality.
Retraining cadence	daily + emergency runs	The model must keep up with seasonality, campaigns, and shifting user behavior.
Model artifact size	2-8 GB	Artifact storage, delivery, and rollback between environments need explicit operating rules.

Reference MLOps architecture

System overview

The first diagram shows the overall MLOps loop: where data enters, where features are assembled, where training meets quality guardrails, and where release decisions connect to runtime feedback.

MLOps loop layers

Sources and ingest

Raw events, batch exports, and streaming signals converge into one intake path with initial checks.

batch / streamingestquality checksbackfill

Stage transition

Data and feature layer

Transforms, snapshots, and online lookups must preserve one meaning of features across training and runtime.

offline storeonline storefeature logicfreshness

Stage transition

Training and validation

The model trains on a reproducible data slice, goes through train/eval, and clears quality checks before registration.

train / evalexperimentsreproducibilityguardrails

Stage transition

Registry and release

Model versions, configuration, and approval rules are gathered into one release and rollback control loop.

model registryapprovalcanary / shadow / A-Brollback

Stage transition

Runtime and feedback

Serving, degradation, quality signals, and drift monitoring close the loop and trigger the next cycle.

online inferencefallbackdrift monitoringretraining triggers

What to keep under control

Read the diagram as a stack of layers: each level passes along not only artifacts, but also constraints around quality, release safety, and runtime resilience.

Data and feature consistency

data snapshotpoint-in-time logicoffline / online parity

Release control

model registryquality gatescanary / shadow / A-Brollback

Runtime feedback

latency and costfallback pathdrift monitoringretraining triggers

Artifact and release path

The second diagram follows one artifact version: after packaging it moves through shadow, canary, or experiment stages before it is promoted further, held, or rolled back.

Artifact path and release decision

1. Snapshot of data and signals

The team fixes source versions, time windows, and sampling rules so there is no ambiguity about the training slice.

source versionstime windowsampling

Next step

2. Feature build with correct observation time

The pipeline builds features so future information does not leak into training and offline-online meaning stays aligned.

feature logicpoint-in-timeoffline / online parity

Next step

3. Training and validation

The model goes through training, evaluation, sanity checks, and quality thresholds before it can even enter release.

train / evalquality gatesguardrails

Next step

4. Packaging and model registration

Artifact, configuration, dependencies, and version metadata are published together so the release is reproducible and rollback-ready.

artifact bundleconfigmodel registry

Next step

5. Shadow / canary / experiment

The new version first earns evidence on limited traffic before it is considered for full rollout.

shadowcanaryA/B

Next step

6. Live inference and metrics

The system watches latency, errors, quality, fallback rate, and critical segments on live traffic.

latencyqualityfallback rate

Next step

7. Promote, roll back, or retrain

After release evidence is collected, the team either promotes the version, rolls it back, or starts the next improvement cycle.

promoterollbackretrain

How to read the path

The left side shows one version moving from the data slice to the release decision. The right side highlights what evidence the team collects along the way.

Before registration

data and signal snapshotfeature logictraining and validation

Before full rollout

artifact packagingshadow / canary / A-Brollback readiness

After entering runtime

latency and qualityfallback ratepromote / rollback / retrain

Read together, the two diagrams show that an MLOps pipeline rarely fails inside one model. It usually fails at the seams between data, release control, and runtime when model versions, feature logic, and degradation rules stop moving in sync.

Key trade-offs

Signal freshness vs reproducibility

Faster feature and model refresh reacts better to new signals, but makes old outcomes harder to reproduce and regressions harder to investigate.

Batch simplicity vs stream responsiveness

Batch paths are cheaper and easier to operate, but they lose on freshness. Stream paths cut lag at the cost of much higher operational complexity.

Single model vs segment routing

One general model is easier to manage, but segment routing often wins on quality at the price of more complicated release control.

Strict guardrails vs release speed

Strong quality gates lower incident risk, but they slow release speed. In practice the balance comes from automation and tiered risk policies.

Common anti-patterns

Duplicating feature logic between research code and runtime without a shared source of truth.

Shipping a model without shadow, early limited rollout, or fallback, so any failure becomes a product incident.

Ignoring observation time and accidentally leaking future information into training.

Watching only model outputs while missing shifts in input features, data quality, and update lag.

Recommendations

Make the pipeline contract explicit: schema, stage owner, SLO, rollback procedure, and incident instructions should all be visible.

Keep one dependency graph from source data to features, model version, and release decision.

Plan at least two degradation modes: a fallback model and a rule-based baseline.

Latency and cost constraints should be part of runtime design rather than a post-release optimization pass.

Interview prompts to cover

How does your design prevent training-runtime skew and data leakage?
Which checks can actually stop a release, and what signals are acceptable in an early-stage rollout?
How does the architecture change if inference QPS grows by 10x?
Which end-to-end metrics do you monitor: data lag, feature freshness, model quality, latency, and fallback share?

Related chapters

Feature Store & Model Serving - A deeper dive into offline-online consistency, feature contracts, and the live model path.
Recommendation System - An applied ML case where model quality is tied to candidate generation, ranking, and product outcomes.
Data Pipeline / ETL / ELT Architecture - Foundation for data intake, backfills, orchestration, and quality checks.
Observability & Monitoring Design - How to design SLOs, alerting, and incident handling for production systems.
ML Platform at T-Bank - A real platform case about how ML processes evolve inside a large company.
Precision and recall basics - Metrics intuition for reading model quality before release and after launch.