Feature Store & Model Serving — System Design Space

Feature stores and model serving get hard where one feature definition has to behave the same in training and on the live request path.

The chapter ties point-in-time correctness, online feature access, materialization, freshness, and degradation planning into one workable platform architecture.

For interviews and architecture discussions, this case quickly shows whether you understand training-serving skew and the cost of every extra network dependency on the prediction path.

Offline/Online Parity

Keep feature semantics consistent across training and serving paths.

Rollout Safety

Canary, shadow, rollback, and drift alerting are baseline architecture requirements.

Data Quality

Use guardrails for freshness, lineage, and training-serving skew prevention.

Platform Efficiency

Balance pipeline cost, feature-store footprint, and inference latency.

Context

Machine Learning System Design

A foundational overview of ML architecture that makes feature-layer decisions easier to explain in interviews.

Open chapter

Design a Feature Store is a classic ML System Design case about keeping one meaning of features across training and live serving. Interviewers expect you to explain how point-in-time correctness is preserved, how data freshness is controlled, and what happens when the feature path for live inference starts to degrade.

Scope boundaries

Covered in this chapter

Feature contracts: registry, offline and online storage, materialization, and feature-retrieval APIs.
Consistency between training and runtime, point-in-time correctness, and freshness control.
Reliability of the feature path in live inference: latency SLOs, fallback modes, and degradation rules.

Not covered here

Training orchestration, model selection, and experiment-lifecycle management.
Release control at the model-registry layer: approval rules, shadow mode, canary rollout, and rollback decisions.
The end-to-end retraining loop, drift-driven release cadence, and ownership of the full ML delivery chain.

The end-to-end ML lifecycle and release loop are covered in ML Ops Pipeline.

Problem and context

The product runs multiple ML use cases, such as personalization, fraud, and risk scoring, but teams compute features in separate pipelines and end up with inconsistent data between training and runtime. The goal of this chapter is to design the feature layer as a platform service with shared contracts and explicit SLAs.

Functional requirements

One feature catalog with explicit ownership, schema, entity keys, source of truth, and transformation version.
Offline feature retrieval for training and validation with strict point-in-time correctness.
Online feature retrieval for live inference with low-latency access and stable API contracts.
A materialization path that moves computed features from batch and streaming pipelines into the online store while preserving freshness.
History replay and dataset rebuilds when feature logic changes, without one-off scripts.

Non-functional requirements

Online read latency: p95 < 30 ms. Otherwise the feature layer starts constraining the user-facing prediction path.
Availability: 99.95%+. Online-store outages directly block models in critical product flows.
Freshness SLA: <= 5 minutes for hot features. Stale features quickly degrade ranking, personalization, and fraud quality.
Skew control: 0 critical skews without alerting. Training-serving mismatches must be detected before user-visible degradation.

Load and scale

Inference traffic

40k-120k RPS

Peak load on the online store in recommendation and fraud scenarios.

Feature vector size

50-300 features per request

Requires batched entity reads and efficient response serialization.

Entity cardinality

100M+ users / devices / objects

High cardinality affects sharding strategy and online-index size.

Streaming ingress

1M-3M events/s

Needs backpressure protection and idempotent materialization logic.

Daily offline snapshots

2-8 TB/day

History replay and time-aware joins require deliberate storage and partitioning strategy.

Related chapter

ETL/ELT Architecture

The feature layer depends on mature batch and stream pipelines plus reliable orchestration.

Open chapter

Architecture

This architecture should clearly separate data intake, the offline layer for training, and the online path for feature access. That makes retrieval easier to reason about, isolates materialization issues, preserves reproducibility for training slices, keeps training-serving skew visible, and protects the latency budget of the live request path.

Feature Store Architecture

Highlight a slice: ingestion, offline, online, or observability

Event Sources

Product events, CRM, billing, clicks, logs

Batch ETL/ELT

Daily/hourly pipelines and backfills

Stream Processing

Near real-time transforms with watermarking

Offline Store

Historical snapshots for train/validation

Feature Registry

Schemas, owners, versions, SLA, lineage

Materialization Service

Online-store upserts, dedup, conflict policy

Online Store

Low-latency key-value for inference

Serving SDK / Gateway

Stable feature API contract for models

Skew checks

Freshness SLA

Null-rate alerts

SLA

Latency budget: p95 < 30msFreshness budget: <= 5m (hot features)Replay window: 30-90 days

Layer responsibilities

Feature registry

Catalog of feature definitions with schema, ownership, SLA, source references, and readiness status for live use.

Data intake and transformations

Batch ETL/ELT plus stream processing. Feature logic is packaged as reusable transformation contracts.

Offline store

Historical feature values for training, replay, and reproducible dataset generation.

Online store

Low-latency key-value reads for live inference, with TTL, selective invalidation, and hot-key protection.

Materialization service

Moves computed features into the online store and controls watermarks, late events, and conflict resolution rules.

Feature access SDK / gateway

A stable API for feature reads that pins request schema and shields clients from storage-level changes.

Quality and observability

Freshness, availability, skew, null-rate, and latency metrics, with alerts and dashboards for critical features.

Feature contract strategy

Lock entity keys, transformation version, TTL, freshness expectations, and dataset lineage in the contract itself. That reduces hidden skew during updates and makes it easier to fall back, return to a simpler baseline, or roll back the dependent model when quality drops.

Key deep dives

Point-in-time correctness

Training sets must include only feature values that were truly available at event time. That requires time-aware joins and explicit rules for historical access.

Materialization consistency

Batch and stream paths often overlap. You need idempotent updates, deduplication, and deterministic conflict resolution by version or time.

Training-serving skew control

Compare feature distributions between offline slices and live traffic, define an acceptable skew budget, and stop rollout when it is exceeded.

Online degradation plan

If the feature store fails, the inference path should fall back to cached features, a reduced feature set, or a rule-based baseline.

Trade-offs

Stronger normalization of feature definitions reduces duplication but slows down local experiment velocity.

A streaming-first design improves freshness but raises operational complexity and on-call cost.

One global feature store simplifies control and governance, but increases blast radius when materialization breaks.

Short TTLs reduce stale-value risk but increase recomputation pressure and cache churn.

Recommendations

Start with a limited set of truly high-impact features and explicit ownership for each one.
Version transformations as code and make schema and skew checks a mandatory part of CI/CD.
Break SLA into intake, materialization, and online-read budgets.
Design the fallback path before the system reaches production, not after the first incident.

Common mistakes

Feature logic is duplicated between notebooks and production code without a shared registry or versioning.
Training slices are built without point-in-time rules, creating hidden data leakage.
The online store is updated without freshness or skew monitoring, so degradation appears only in business metrics.
The team ships a model without a predefined safe behavior for feature-store outages.

References

Feast documentation - Open documentation for feature registries, offline and online storage, and materialization jobs.
Hopsworks Feature Store docs - An approach to feature groups, training datasets, and online feature access inside one platform.
Tecton docs - Practical patterns for feature design, streaming transformations, and production feature access.
Google Cloud MLOps architecture guide - A system-level view of MLOps pipelines, release control, and operational loops.

Related chapters

How the System Design task section is structured - Entry map of the case-studies section and the shared framework this walkthrough follows.
Machine Learning System Design (short summary) - A system-level ML architecture view where the feature layer connects data, training, and live inference.
AI Engineering (short summary) - Evaluation, deployment, and operational maturity practices for production AI systems.
ETL/ELT Architecture - Foundation for batch and stream pipelines, history replay, and orchestration of feature computation.
Designing Event-Driven Systems (short summary) - Streaming ingestion patterns and delivery guarantees for near-real-time feature updates.
Data Governance & Compliance - Control of personal data, dataset lineage, and audit requirements for sensitive feature pipelines.
Observability & Monitoring Design - How to design freshness, skew, and latency metrics as part of a reliability contract.
ML Platform at T-Bank - A practical platform-engineering context for ML workflows at enterprise scale.