System Design Space
Knowledge graphSettings

Updated: June 2, 2026 at 7:30 PM

Feature Store & Model Serving

hard

Case study on feature stores and model serving: preserving one meaning of features across training and runtime, keeping point-in-time correctness, and controlling training-serving skew.

Feature stores and model serving get hard where one feature definition has to behave the same in training and on the live request path.

The chapter ties point-in-time correctness, online feature access, materialization, freshness, and degradation planning into one workable platform architecture.

For interviews and architecture discussions, this case quickly shows whether you understand training-serving skew and the cost of every extra network dependency on the prediction path.

Offline/Online Parity

Keep feature semantics consistent across training and serving paths.

Rollout Safety

Canary, shadow, rollback, and drift alerting are baseline architecture requirements.

Data Quality

Use guardrails for freshness, lineage, and training-serving skew prevention.

Platform Efficiency

Balance pipeline cost, feature-store footprint, and inference latency.

Context

Machine Learning System Design

A foundational overview of ML architecture that makes feature-layer decisions easier to explain in interviews.

Open chapter

Design a Feature Store is a classic ML System Design case about keeping one meaning of features across training and live serving. Interviewers expect you to explain how point-in-time correctness is preserved, how data freshness is controlled, and what happens when the feature path for live inference starts to degrade.

Scope boundaries

Covered in this chapter

  • Feature contracts: registry, offline and online storage, materialization, and feature-retrieval APIs.
  • Consistency between training and runtime, point-in-time correctness, and freshness control.
  • Reliability of the feature path in live inference: latency SLOs, fallback modes, and degradation rules.

Not covered here

  • Training orchestration, model selection, and experiment-lifecycle management.
  • Release control at the model-registry layer: approval rules, shadow mode, canary rollout, and rollback decisions.
  • The end-to-end retraining loop, drift-driven release cadence, and ownership of the full ML delivery chain.

The end-to-end ML lifecycle and release loop are covered in ML Ops Pipeline.

Problem and context

The product runs multiple ML use cases, such as personalization, fraud, and risk scoring, but teams compute features in separate pipelines and end up with inconsistent data between training and runtime. The goal of this chapter is to design the feature layer as a platform service with shared contracts and explicit SLAs.

Functional requirements

  • One feature catalog with explicit ownership, schema, entity keys, source of truth, and transformation version.
  • Offline feature retrieval for training and validation with strict point-in-time correctness.
  • Online feature retrieval for live inference with low-latency access and stable API contracts.
  • A materialization path that moves computed features from batch and streaming pipelines into the online store while preserving freshness.
  • History replay and dataset rebuilds when feature logic changes, without one-off scripts.

Non-functional requirements

  • Online read latency: p95 < 30 ms. Otherwise the feature layer starts constraining the user-facing prediction path.
  • Availability: 99.95%+. Online-store outages directly block models in critical product flows.
  • Freshness SLA: <= 5 minutes for hot features. Stale features quickly degrade ranking, personalization, and fraud quality.
  • Skew control: 0 critical skews without alerting. Training-serving mismatches must be detected before user-visible degradation.

Load and scale

Inference traffic

40k-120k RPS

Peak load on the online store in recommendation and fraud scenarios.

Feature vector size

50-300 features per request

Requires batched entity reads and efficient response serialization.

Entity cardinality

100M+ users / devices / objects

High cardinality affects sharding strategy and online-index size.

Streaming ingress

1M-3M events/s

Needs backpressure protection and idempotent materialization logic.

Daily offline snapshots

2-8 TB/day

History replay and time-aware joins require deliberate storage and partitioning strategy.

Related chapter

ETL/ELT Architecture

The feature layer depends on mature batch and stream pipelines plus reliable orchestration.

Open chapter

Architecture

This architecture should clearly separate data intake, the offline layer for training, and the online path for feature access. That makes retrieval easier to reason about, isolates materialization issues, preserves reproducibility for training slices, keeps training-serving skew visible, and protects the latency budget of the live request path.

Feature Store Architecture

Highlight a slice: ingestion, offline, online, or observability

Event Sources

Product events, CRM, billing, clicks, logs

Batch ETL/ELT

Daily/hourly pipelines and backfills

Stream Processing

Near real-time transforms with watermarking

Offline Store

Historical snapshots for train/validation

Feature Registry

Schemas, owners, versions, SLA, lineage

Materialization Service

Online-store upserts, dedup, conflict policy

Online Store

Low-latency key-value for inference

Serving SDK / Gateway

Stable feature API contract for models

Skew checks
Freshness SLA
Null-rate alerts

SLA

Latency budget: p95 < 30msFreshness budget: <= 5m (hot features)Replay window: 30-90 days

Layer responsibilities

Feature registry

Catalog of feature definitions with schema, ownership, SLA, source references, and readiness status for live use.

Data intake and transformations

Batch ETL/ELT plus stream processing. Feature logic is packaged as reusable transformation contracts.

Offline store

Historical feature values for training, replay, and reproducible dataset generation.

Online store

Low-latency key-value reads for live inference, with TTL, selective invalidation, and hot-key protection.

Materialization service

Moves computed features into the online store and controls watermarks, late events, and conflict resolution rules.

Feature access SDK / gateway

A stable API for feature reads that pins request schema and shields clients from storage-level changes.

Quality and observability

Freshness, availability, skew, null-rate, and latency metrics, with alerts and dashboards for critical features.

Feature contract strategy

Lock entity keys, transformation version, TTL, freshness expectations, and dataset lineage in the contract itself. That reduces hidden skew during updates and makes it easier to fall back, return to a simpler baseline, or roll back the dependent model when quality drops.

Key deep dives

Point-in-time correctness

Training sets must include only feature values that were truly available at event time. That requires time-aware joins and explicit rules for historical access.

Materialization consistency

Batch and stream paths often overlap. You need idempotent updates, deduplication, and deterministic conflict resolution by version or time.

Training-serving skew control

Compare feature distributions between offline slices and live traffic, define an acceptable skew budget, and stop rollout when it is exceeded.

Online degradation plan

If the feature store fails, the inference path should fall back to cached features, a reduced feature set, or a rule-based baseline.

Trade-offs

Stronger normalization of feature definitions reduces duplication but slows down local experiment velocity.

A streaming-first design improves freshness but raises operational complexity and on-call cost.

One global feature store simplifies control and governance, but increases blast radius when materialization breaks.

Short TTLs reduce stale-value risk but increase recomputation pressure and cache churn.

Recommendations

  • Start with a limited set of truly high-impact features and explicit ownership for each one.
  • Version transformations as code and make schema and skew checks a mandatory part of CI/CD.
  • Break SLA into intake, materialization, and online-read budgets.
  • Design the fallback path before the system reaches production, not after the first incident.

Common mistakes

  • Feature logic is duplicated between notebooks and production code without a shared registry or versioning.
  • Training slices are built without point-in-time rules, creating hidden data leakage.
  • The online store is updated without freshness or skew monitoring, so degradation appears only in business metrics.
  • The team ships a model without a predefined safe behavior for feature-store outages.

References

Related chapters

Enable tracking in Settings