System Design Space
Knowledge graphSettings

Updated: May 1, 2026 at 6:48 PM

Kappa Architecture: stream-first alternative to Lambda

hard

A single streaming path over an event log: materialized views, historical replay, backfill, and a practical comparison with Lambda.

Kappa becomes interesting when a team is tired of paying for duplicate batch and speed paths and is willing to make the event log the center of the architecture.

In practice, this chapter helps decide when one streaming path truly simplifies the system and when historical replay, retention, and durable-log requirements cost more than the simplification is worth.

In interviews and architecture reviews, it is especially useful when you need to explain why the batch layer was removed, which complexity disappeared, and which new requirements around log storage, backfill, and materialized views appeared instead.

Practical value of this chapter

Design in practice

Helps design one streaming path without duplicating batch and stream logic.

Decision quality

Guides Kappa adoption where historical replay and fresh data matter more than the cost of a batch layer.

Interview articulation

Supports explaining backfill, bootstrap, and historical recomputation in Kappa.

Risk and trade-offs

Highlights retention, storage, and durable-log requirements of the model.

Related book

Big Data (Nathan Marz)

The basic formulation of Lambda Architecture and the context from which Kappa grew.

Open chapter

Kappa Architecture puts an event log at the center of data processing: events are stored once, and serving views are rebuilt from that log through the same path that handles the live stream.

Why Kappa emerged

One circuit instead of two

Lambda requires both a batch path and a speed path, which makes development and operations harder.

Replay as a standard operation

Recalculation is done by replaying events from the log through the same code that serves the live stream.

Streaming-native platform

Modern streaming platforms allow you to build stable stateful pipelines without a separate batch stack.

Kappa flow

Producersapps, services, CDCImmutable logKafka / Pulsar / Redpandapartitioned event streamStream processorenrich, join, aggregatesame path for live + replayServing storelow-latency readsOLAP viewanalyticsSearch indexspecialized queryReplay / backfilloffset / timesame pathOne processing modelno separate batch layereventsstream

Kappa removes the separate batch layer: the live stream and historical replay move through the same processing path.

Lambda and Kappa: key differences

CriterionLambdaKappa
Compute modelBatch, speed, and serving layers.One streaming path plus materialized views.
Code pathsTwo separate paths for batch recomputation and fast updates.One processing path for the live stream and historical replay.
ReprocessingOften through a batch recompute over the full historical dataset.Replay events from an immutable log through the same stream processor.
LatencyLow through the speed layer, then reconciled with the batch result.Low if the stream processor and state store can handle the load.
Operational complexityHigher because two stacks must keep their semantics aligned.Fewer paths, but stricter requirements for the streaming platform.
Best fitWhen batch, ETL, and streaming workloads are already mature and separate.When the platform is built around an event log (Kafka/Pulsar).

How Kappa is implemented in practice

  1. Make the event log the source of truth: events should be immutable and schemas should be versioned.
  2. Move key materialized views into the streaming processing path.
  3. Make historical replay and backfill standard operations that run through the same code.
  4. Separate stateful processing from serving APIs with clear data contracts.
  5. Define SLAs for late events, ordering, and exactly-once / at-least-once delivery guarantees.

When to choose and what to look for

Kappa is suitable if

  • Core domain data is already generated as events.
  • The system needs one logic path for live processing and historical recomputation.
  • The team is ready to operate a streaming stack and stateful processors.

Risk areas

  • Poor event schemas and weak schema governance.
  • Heavy joins or windowed computations without state-size control.
  • Underestimating historical replay costs: CPU, storage I/O, and backpressure.

Related chapters

Enable tracking in Settings