Feature stores and model serving get hard where one feature definition has to behave the same in training and on the live request path.
The chapter ties point-in-time correctness, online feature access, materialization, freshness, and degradation planning into one workable platform architecture.
For interviews and architecture discussions, this case quickly shows whether you understand training-serving skew and the cost of every extra network dependency on the prediction path.
Offline/Online Parity
Keep feature semantics consistent across training and serving paths.
Rollout Safety
Canary, shadow, rollback, and drift alerting are baseline architecture requirements.
Data Quality
Use guardrails for freshness, lineage, and training-serving skew prevention.
Platform Efficiency
Balance pipeline cost, feature-store footprint, and inference latency.
Context
Machine Learning System Design
A foundational overview of ML architecture that makes feature-layer decisions easier to explain in interviews.
Design a Feature Store is a classic ML System Design case about keeping one meaning of features across training and live serving. Interviewers expect you to explain how point-in-time correctness is preserved, how data freshness is controlled, and what happens when the feature path for live inference starts to degrade.
Scope boundaries
Covered in this chapter
- Feature contracts: registry, offline and online storage, materialization, and feature-retrieval APIs.
- Consistency between training and runtime, point-in-time correctness, and freshness control.
- Reliability of the feature path in live inference: latency SLOs, fallback modes, and degradation rules.
Not covered here
- Training orchestration, model selection, and experiment-lifecycle management.
- Release control at the model-registry layer: approval rules, shadow mode, canary rollout, and rollback decisions.
- The end-to-end retraining loop, drift-driven release cadence, and ownership of the full ML delivery chain.
The end-to-end ML lifecycle and release loop are covered in ML Ops Pipeline.
Problem and context
Functional requirements
- One feature catalog with explicit ownership, schema, entity keys, source of truth, and transformation version.
- Offline feature retrieval for training and validation with strict point-in-time correctness.
- Online feature retrieval for live inference with low-latency access and stable API contracts.
- A materialization path that moves computed features from batch and streaming pipelines into the online store while preserving freshness.
- History replay and dataset rebuilds when feature logic changes, without one-off scripts.
Non-functional requirements
- Online read latency: p95 < 30 ms. Otherwise the feature layer starts constraining the user-facing prediction path.
- Availability: 99.95%+. Online-store outages directly block models in critical product flows.
- Freshness SLA: <= 5 minutes for hot features. Stale features quickly degrade ranking, personalization, and fraud quality.
- Skew control: 0 critical skews without alerting. Training-serving mismatches must be detected before user-visible degradation.
Load and scale
Inference traffic
40k-120k RPS
Peak load on the online store in recommendation and fraud scenarios.
Feature vector size
50-300 features per request
Requires batched entity reads and efficient response serialization.
Entity cardinality
100M+ users / devices / objects
High cardinality affects sharding strategy and online-index size.
Streaming ingress
1M-3M events/s
Needs backpressure protection and idempotent materialization logic.
Daily offline snapshots
2-8 TB/day
History replay and time-aware joins require deliberate storage and partitioning strategy.
Related chapter
ETL/ELT Architecture
The feature layer depends on mature batch and stream pipelines plus reliable orchestration.
Architecture
This architecture should clearly separate data intake, the offline layer for training, and the online path for feature access. That makes retrieval easier to reason about, isolates materialization issues, preserves reproducibility for training slices, keeps training-serving skew visible, and protects the latency budget of the live request path.
Feature Store Architecture
Highlight a slice: ingestion, offline, online, or observability
Event Sources
Product events, CRM, billing, clicks, logs
Batch ETL/ELT
Daily/hourly pipelines and backfills
Stream Processing
Near real-time transforms with watermarking
Offline Store
Historical snapshots for train/validation
Feature Registry
Schemas, owners, versions, SLA, lineage
Materialization Service
Online-store upserts, dedup, conflict policy
Online Store
Low-latency key-value for inference
Serving SDK / Gateway
Stable feature API contract for models
SLA
Layer responsibilities
Feature registry
Catalog of feature definitions with schema, ownership, SLA, source references, and readiness status for live use.
Data intake and transformations
Batch ETL/ELT plus stream processing. Feature logic is packaged as reusable transformation contracts.
Offline store
Historical feature values for training, replay, and reproducible dataset generation.
Online store
Low-latency key-value reads for live inference, with TTL, selective invalidation, and hot-key protection.
Materialization service
Moves computed features into the online store and controls watermarks, late events, and conflict resolution rules.
Feature access SDK / gateway
A stable API for feature reads that pins request schema and shields clients from storage-level changes.
Quality and observability
Freshness, availability, skew, null-rate, and latency metrics, with alerts and dashboards for critical features.
Feature contract strategy
Lock entity keys, transformation version, TTL, freshness expectations, and dataset lineage in the contract itself. That reduces hidden skew during updates and makes it easier to fall back, return to a simpler baseline, or roll back the dependent model when quality drops.
Key deep dives
Point-in-time correctness
Training sets must include only feature values that were truly available at event time. That requires time-aware joins and explicit rules for historical access.
Materialization consistency
Batch and stream paths often overlap. You need idempotent updates, deduplication, and deterministic conflict resolution by version or time.
Training-serving skew control
Compare feature distributions between offline slices and live traffic, define an acceptable skew budget, and stop rollout when it is exceeded.
Online degradation plan
If the feature store fails, the inference path should fall back to cached features, a reduced feature set, or a rule-based baseline.
Trade-offs
Stronger normalization of feature definitions reduces duplication but slows down local experiment velocity.
A streaming-first design improves freshness but raises operational complexity and on-call cost.
One global feature store simplifies control and governance, but increases blast radius when materialization breaks.
Short TTLs reduce stale-value risk but increase recomputation pressure and cache churn.
Recommendations
- Start with a limited set of truly high-impact features and explicit ownership for each one.
- Version transformations as code and make schema and skew checks a mandatory part of CI/CD.
- Break SLA into intake, materialization, and online-read budgets.
- Design the fallback path before the system reaches production, not after the first incident.
Common mistakes
- Feature logic is duplicated between notebooks and production code without a shared registry or versioning.
- Training slices are built without point-in-time rules, creating hidden data leakage.
- The online store is updated without freshness or skew monitoring, so degradation appears only in business metrics.
- The team ships a model without a predefined safe behavior for feature-store outages.
References
- Feast documentation - Open documentation for feature registries, offline and online storage, and materialization jobs.
- Hopsworks Feature Store docs - An approach to feature groups, training datasets, and online feature access inside one platform.
- Tecton docs - Practical patterns for feature design, streaming transformations, and production feature access.
- Google Cloud MLOps architecture guide - A system-level view of MLOps pipelines, release control, and operational loops.
Related chapters
- How the System Design task section is structured - Entry map of the case-studies section and the shared framework this walkthrough follows.
- Machine Learning System Design (short summary) - A system-level ML architecture view where the feature layer connects data, training, and live inference.
- AI Engineering (short summary) - Evaluation, deployment, and operational maturity practices for production AI systems.
- ETL/ELT Architecture - Foundation for batch and stream pipelines, history replay, and orchestration of feature computation.
- Designing Event-Driven Systems (short summary) - Streaming ingestion patterns and delivery guarantees for near-real-time feature updates.
- Data Governance & Compliance - Control of personal data, dataset lineage, and audit requirements for sensitive feature pipelines.
- Observability & Monitoring Design - How to design freshness, skew, and latency metrics as part of a reliability contract.
- ML Platform at T-Bank - A practical platform-engineering context for ML workflows at enterprise scale.
