An MLOps pipeline gets hard not inside one training job, but where data versions, model release, and runtime operations start moving at different speeds.
The chapter connects ingest, training, model registry, live inference, and drift signals into one engineering system that can be released and rolled back safely.
For interviews and engineering discussions, this case is especially useful for talking about reproducibility, release discipline, and team responsibility after launch.
Offline/Online Parity
Keep feature semantics consistent across training and serving paths.
Rollout Safety
Canary, shadow, rollback, and drift alerting are baseline architecture requirements.
Data Quality
Use guardrails for freshness, lineage, and training-serving skew prevention.
Platform Efficiency
Balance pipeline cost, feature-store footprint, and inference latency.
Related chapter
Machine Learning System Design
A framing toolkit for ML case studies: requirements, metrics, data, and operating risks.
ML Ops Pipeline is not a case about one training job. It is about an engineering system that has to move a model from data to a stable release and survive the next change cycle. In interviews, the important part is showing that you can assemble the full lifecycle: data, training, model registry, serving, release control, and post-release response.
Scope boundaries
Covered in this chapter
- From data intake and training to release, live inference, monitoring, and the next retraining cycle.
- Release control: quality gates, limited-rollout scenarios, and rollback readiness.
- Operating model: SLOs, stage ownership, incident instructions, and response to drift or quality issues.
Left out on purpose
- Low-level feature-registry API design and schema evolution mechanics.
- Detailed online feature-layer internals: key design, TTL strategy, hot-key mitigation, and cache implementation.
- Deep mechanics of materialization jobs and conflict resolution between batch and stream updates.
Detailed runtime design of the feature store and serving contracts is covered in Feature Store & Model Serving.
Functional requirements
- Build one continuous path from raw events and source data to live model inference.
- Preserve reproducibility across training runs: dataset versions, feature definitions, and model artifacts must stay explicit.
- Support controlled rollout with safe rollback rather than a one-shot deployment.
- Close the feedback loop with online metrics, quality signals, drift monitoring, and retraining triggers.
Non-functional requirements
- p95 latency of live inference below 150 ms for user-facing scenarios.
- Feature freshness SLA of 1-5 minutes for critical behavioral signals.
- 99.95% availability of the inference path with graceful degradation through fallback and a rule-based baseline.
- Full traceability across dataset lineage, feature versions, model versions, and release decisions.
Scale and assumptions
| Parameter | Assumption | Why it matters |
|---|---|---|
| DAU | 8M | The product sees a constant flow of user events and real-time personalization requests. |
| Peak inference QPS | 120k | Traffic is spread across several surfaces, including feed, search, and recommendations. |
| Feature updates | 1.5B/day | Signals have to be materialized quickly in the online layer or the model starts lagging behind product reality. |
| Retraining cadence | daily + emergency runs | The model must keep up with seasonality, campaigns, and shifting user behavior. |
| Model artifact size | 2-8 GB | Artifact storage, delivery, and rollback between environments need explicit operating rules. |
Reference MLOps architecture
System overview
The first diagram shows the overall MLOps loop: where data enters, where features are assembled, where training meets quality guardrails, and where release decisions connect to runtime feedback.
MLOps loop layers
Raw events, batch exports, and streaming signals converge into one intake path with initial checks.
Transforms, snapshots, and online lookups must preserve one meaning of features across training and runtime.
The model trains on a reproducible data slice, goes through train/eval, and clears quality checks before registration.
Model versions, configuration, and approval rules are gathered into one release and rollback control loop.
Serving, degradation, quality signals, and drift monitoring close the loop and trigger the next cycle.
What to keep under control
Read the diagram as a stack of layers: each level passes along not only artifacts, but also constraints around quality, release safety, and runtime resilience.
Data and feature consistency
Release control
Runtime feedback
Artifact and release path
The second diagram follows one artifact version: after packaging it moves through shadow, canary, or experiment stages before it is promoted further, held, or rolled back.
Artifact path and release decision
The team fixes source versions, time windows, and sampling rules so there is no ambiguity about the training slice.
The pipeline builds features so future information does not leak into training and offline-online meaning stays aligned.
The model goes through training, evaluation, sanity checks, and quality thresholds before it can even enter release.
Artifact, configuration, dependencies, and version metadata are published together so the release is reproducible and rollback-ready.
The new version first earns evidence on limited traffic before it is considered for full rollout.
The system watches latency, errors, quality, fallback rate, and critical segments on live traffic.
After release evidence is collected, the team either promotes the version, rolls it back, or starts the next improvement cycle.
How to read the path
The left side shows one version moving from the data slice to the release decision. The right side highlights what evidence the team collects along the way.
Before registration
Before full rollout
After entering runtime
Read together, the two diagrams show that an MLOps pipeline rarely fails inside one model. It usually fails at the seams between data, release control, and runtime when model versions, feature logic, and degradation rules stop moving in sync.
Key trade-offs
Signal freshness vs reproducibility
Faster feature and model refresh reacts better to new signals, but makes old outcomes harder to reproduce and regressions harder to investigate.
Batch simplicity vs stream responsiveness
Batch paths are cheaper and easier to operate, but they lose on freshness. Stream paths cut lag at the cost of much higher operational complexity.
Single model vs segment routing
One general model is easier to manage, but segment routing often wins on quality at the price of more complicated release control.
Strict guardrails vs release speed
Strong quality gates lower incident risk, but they slow release speed. In practice the balance comes from automation and tiered risk policies.
Common anti-patterns
Duplicating feature logic between research code and runtime without a shared source of truth.
Shipping a model without shadow, early limited rollout, or fallback, so any failure becomes a product incident.
Ignoring observation time and accidentally leaking future information into training.
Watching only model outputs while missing shifts in input features, data quality, and update lag.
Recommendations
Make the pipeline contract explicit: schema, stage owner, SLO, rollback procedure, and incident instructions should all be visible.
Keep one dependency graph from source data to features, model version, and release decision.
Plan at least two degradation modes: a fallback model and a rule-based baseline.
Latency and cost constraints should be part of runtime design rather than a post-release optimization pass.
Interview prompts to cover
- How does your design prevent training-runtime skew and data leakage?
- Which checks can actually stop a release, and what signals are acceptable in an early-stage rollout?
- How does the architecture change if inference QPS grows by 10x?
- Which end-to-end metrics do you monitor: data lag, feature freshness, model quality, latency, and fallback share?
Related chapters
- Feature Store & Model Serving - A deeper dive into offline-online consistency, feature contracts, and the live model path.
- Recommendation System - An applied ML case where model quality is tied to candidate generation, ranking, and product outcomes.
- Data Pipeline / ETL / ELT Architecture - Foundation for data intake, backfills, orchestration, and quality checks.
- Observability & Monitoring Design - How to design SLOs, alerting, and incident handling for production systems.
- ML Platform at T-Bank - A real platform case about how ML processes evolve inside a large company.
- Precision and recall basics - Metrics intuition for reading model quality before release and after launch.
