System Design Space
Knowledge graphSettings

Updated: March 13, 2026 at 3:30 PM

ML Ops Pipeline

medium

Classic task: feature pipelines, model registry, offline/online parity, rollout safety, and drift monitoring.

This Theme 3 chapter focuses on AI/ML workflows, feature pipelines, and rollout control. The goal is not only to propose a working design, but also to explain behavior under scale and failure pressure.

Use a stable structure: requirements -> architecture -> critical deep dive -> evolution. This makes the solution clear, defensible, and interview-ready.

Offline/Online Parity

Keep feature semantics consistent across training and serving paths.

Rollout Safety

Canary, shadow, rollback, and drift alerting are baseline architecture requirements.

Data Quality

Use guardrails for freshness, lineage, and training-serving skew prevention.

Platform Efficiency

Balance pipeline cost, feature-store footprint, and inference latency.

Case-Solving Playbook

1

Define feature contract

Phase 1

Align feature schema and semantics across offline and online paths.

2

Build rollout policy

Phase 2

Specify canary/shadow/rollback and model-quality observability.

3

Cover data quality risks

Phase 3

Add freshness, lineage, and drift-skew guardrails end-to-end.

4

Optimize cost envelope

Phase 4

Balance inference SLA with feature/model pipeline operating cost.

Related chapter

Machine Learning System Design

Framework for ML case studies: requirements, metrics, data, and operational risks.

Читать обзор

ML Ops Pipeline is a system-design case about moving models from experimentation into stable production operations. Interviewers expect you to design the end-to-end lifecycle: data, training, release, serving, monitoring, and safe degradation under failures.

Chapter scope boundaries

Covered in this chapter

  • End-to-end lifecycle: ingest -> training/eval -> registry/release -> serving -> monitoring -> retraining.
  • Release governance: quality gates, rollout policy (canary/shadow/A-B), and rollback readiness.
  • Operating model: SLOs, ownership, runbooks, and response to drift/quality incidents.

Not covered here

  • Low-level feature-registry schema design and API-level schema evolution mechanics.
  • Detailed online/offline retrieval contracts, key design, TTL strategy, and hot-key mitigation in online store.
  • Deep internals of feature materialization jobs and batch/stream conflict resolution.

Detailed runtime design of Feature Store and serving contracts is covered in Feature Store & Model Serving.

Functional requirements

  • Build one end-to-end pipeline from raw events/data to production inference.
  • Support reproducible training with versioned datasets, feature definitions, and model artifacts.
  • Enable controlled model rollout (canary/shadow/A-B) with safe rollback.
  • Implement a feedback loop with online metrics, drift signals, and retraining triggers.

Non-functional requirements

  • p95 online inference latency below 150 ms for user-facing paths.
  • Feature freshness SLA of 1-5 minutes for critical behavioral signals.
  • 99.95% inference availability with graceful degradation via fallback baseline.
  • Full auditability: lineage for data, features, models, and rollout decisions.

Scale and assumptions

ParameterAssumptionWhy it matters
DAU8MLarge product with continuous user events and realtime personalization.
Peak inference QPS120kTraffic is spread across multiple user surfaces: feed, search, and recommendations.
Feature updates1.5B/dayEvent streams require near real-time materialization into online feature stores.
Model retraining cadencedaily + emergency retrainsModels must adapt to seasonality, campaigns, and distribution shifts.
Peak artifact size2-8 GB/modelNeeds robust storage, delivery, and rollback policies for model artifacts.

High-Level Architecture

Stage 1: Data & Feature Pipelines

Batch + streaming ingestion, quality checks, point-in-time joins, and feature publication to offline/online stores.

Stage 2: Training & Validation

Train/eval orchestration, experiment tracking, reproducible datasets, and model-quality guardrails.

Stage 3: Registry & Release Management

Model registry with stage transitions (staging -> canary -> prod), approval policy, and rollback-ready packages.

Stage 4: Serving & Monitoring

Online inference API, fallback policy, latency/error/freshness SLOs, and drift monitoring with auto-alerting.

Typical flow: events and source data enter ingestion, features are published to offline/online stores, orchestrators run train/eval jobs, registry controls versions and rollout, and serving closes the feedback loop with online metrics and drift signals.

Deep Dives and trade-offs

Freshness vs reproducibility

Faster feature/model refresh improves adaptation to new signals, but increases reproducibility risk and makes regression analysis harder.

Batch simplicity vs streaming responsiveness

Batch pipelines are cheaper and easier to operate but lose on freshness. Streaming lowers lag at the cost of significantly higher operational complexity.

Single model vs multi-model routing

A single general model is easier to manage but often lower quality. Segment routing improves quality but increases versioning and rollout complexity.

Strict guardrails vs release speed

Hard quality gates reduce incident risk but slow down delivery. A practical balance is achieved with risk-tier policies and automated checks.

Common anti-patterns

Feature logic is duplicated between training notebooks and production code without shared registry/versioning.

Model rollout happens without canary/shadow checks and without fallback, making incidents immediately user-visible.

No point-in-time controls: training leaks future signals and production quality drops sharply.

Drift monitoring is applied only to model output, without monitoring input feature distributions.

Recommendations

Define pipeline contracts explicitly: schema, ownership, SLO, rollback procedure, and runbook for each stage.

Maintain a single lineage graph: source data -> features -> model version -> release decision.

Prepare at least two degradation modes: fallback model and rule-based baseline.

Enforce budget-aware inference with latency/cost constraints and critical-surface prioritization.

Interview prompts to cover

  • How does your design prevent training-serving skew and data leakage?
  • Which quality gates can block rollout, and which canary signals are acceptable?
  • How does the architecture evolve under 10x inference QPS growth?
  • Which end-to-end SLOs do you monitor: data lag, feature freshness, model quality, latency, fallback rate?

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov