Related book
Big Data (Nathan Marz)
The basic formulation of Lambda Architecture and the context from which Kappa grew.
Kappa Architecture is a stream-first approach to data processing, in which the main computational circuit is the same: events are written to an immutable log, and all views are built by stream processing and materialized into target storage.
Why Kappa came to be
One circuit instead of two
Lambda requires support for two computation paths (batch and speed), which complicates development and operation.
Replay as a standard mechanism
Recalculation is performed by replaying events from the log using the same code that serves realtime.
Stream-native platform
Modern streaming platforms allow you to build stable stateful pipelines without a separate batch stack.
Kappa Basic Stream
Kappa does not have a separate batch layer: the same stream pipeline processes both the live stream and the historical replay.
Lambda vs Kappa
| Criterion | Lambda | Kappa |
|---|---|---|
| Compute model | Batch + Speed + Serving layers. | Single stream pipeline + materialized views. |
| Code paths | Two separate calculation circuits (batch and realtime). | One processing circuit for online and replay scenarios. |
| Reprocessing | Often through batch-recompute of the entire dataset. | Replay from immutable log via the same stream pipeline. |
| Latency | Low via speed layer + eventual merge with batch. | Low if the stream processor and state store can handle the load. |
| Operational complexity | Higher due to two stacks and semantics alignment. | Lower in the number of circuits, but higher requirements for the stream stack. |
| Best fit | When batch/ETL and stream worlds are already strong and separated. | When the platform is built around an event log (Kafka/Pulsar). |
How Kappa is implemented in practice
- Make event log the source of truth: immutable events with versioned schemas.
- Transfer key materialized views to the stream processing path.
- Add a replay/backfill pipeline: replaying events should be a standard operation.
- Separate stateful processing from serving APIs through clear data contracts.
- Fix the SLA for late events, ordering and exactly-once/at-least-once semantics.
When to choose and what to look for
Kappa is suitable if
- Basic domain data is already generated as events.
- We need a single logic for realtime and historical recalculation.
- The team is ready to exploit the stream-first stack and stateful processors.
Risk areas
- Poor quality event schemes and lack of schema governance.
- Too heavy join/window calculations without state size control.
- Underestimation of the cost of replay: CPU, storage I/O, backpressure.
