Kappa becomes interesting when a team is tired of paying for duplicate batch and speed paths and is willing to make the event log the center of the architecture.
In practice, this chapter helps decide when one streaming path truly simplifies the system and when historical replay, retention, and durable-log requirements cost more than the simplification is worth.
In interviews and architecture reviews, it is especially useful when you need to explain why the batch layer was removed, which complexity disappeared, and which new requirements around log storage, backfill, and materialized views appeared instead.
Practical value of this chapter
Design in practice
Helps design one streaming path without duplicating batch and stream logic.
Decision quality
Guides Kappa adoption where historical replay and fresh data matter more than the cost of a batch layer.
Interview articulation
Supports explaining backfill, bootstrap, and historical recomputation in Kappa.
Risk and trade-offs
Highlights retention, storage, and durable-log requirements of the model.
Related book
Big Data (Nathan Marz)
The basic formulation of Lambda Architecture and the context from which Kappa grew.
Kappa Architecture puts an event log at the center of data processing: events are stored once, and serving views are rebuilt from that log through the same path that handles the live stream.
Why Kappa emerged
One circuit instead of two
Lambda requires both a batch path and a speed path, which makes development and operations harder.
Replay as a standard operation
Recalculation is done by replaying events from the log through the same code that serves the live stream.
Streaming-native platform
Modern streaming platforms allow you to build stable stateful pipelines without a separate batch stack.
Kappa flow
Kappa removes the separate batch layer: the live stream and historical replay move through the same processing path.
Lambda and Kappa: key differences
| Criterion | Lambda | Kappa |
|---|---|---|
| Compute model | Batch, speed, and serving layers. | One streaming path plus materialized views. |
| Code paths | Two separate paths for batch recomputation and fast updates. | One processing path for the live stream and historical replay. |
| Reprocessing | Often through a batch recompute over the full historical dataset. | Replay events from an immutable log through the same stream processor. |
| Latency | Low through the speed layer, then reconciled with the batch result. | Low if the stream processor and state store can handle the load. |
| Operational complexity | Higher because two stacks must keep their semantics aligned. | Fewer paths, but stricter requirements for the streaming platform. |
| Best fit | When batch, ETL, and streaming workloads are already mature and separate. | When the platform is built around an event log (Kafka/Pulsar). |
How Kappa is implemented in practice
- Make the event log the source of truth: events should be immutable and schemas should be versioned.
- Move key materialized views into the streaming processing path.
- Make historical replay and backfill standard operations that run through the same code.
- Separate stateful processing from serving APIs with clear data contracts.
- Define SLAs for late events, ordering, and exactly-once / at-least-once delivery guarantees.
When to choose and what to look for
Kappa is suitable if
- Core domain data is already generated as events.
- The system needs one logic path for live processing and historical recomputation.
- The team is ready to operate a streaming stack and stateful processors.
Risk areas
- Poor event schemas and weak schema governance.
- Heavy joins or windowed computations without state-size control.
- Underestimating historical replay costs: CPU, storage I/O, and backpressure.
Related chapters
- Big Data: Principles and best practices of scalable realtime data systems (short summary) - Kappa's roots in the Lambda comparison and the move from two processing paths to a single event log.
- Streaming Data (short summary) - Hands-on stream-processing practices: delivery semantics, windows, stream state, and operational limits.
- Kafka: The Definitive Guide, 2nd Edition (short summary) - Technology foundation for Kappa with partitioned logs, retention, and historical replay as a standard mechanism.
- Event-driven architecture: Event Sourcing, CQRS, Saga - Architectural integration layer where Kappa naturally builds on event logs and workflow patterns.
- Data Pipeline / ETL / ELT Architecture - How to embed Kappa pipelines into end-to-end data platforms with orchestration and quality controls.
- Distributed message queue - Queue design case covering event order, data durability, and consumer scaling under high load.
- Designing Data-Intensive Applications, 2nd Edition (short summary) - Core theory on stream/table duality, replication, and distributed data processing.
- Enterprise Integration Patterns (short summary) - Pattern language for robust service integration in streaming architectures and event workflows.
- Data Mesh in Action (short summary) - Organizational view on scaling Kappa-like platforms through domain-owned data products.
- Google Global Network: Evolution and Architectural Principles for the AI Era - Network context for cross-region stream processing and latency-sensitive workloads.
