Kappa becomes interesting where a team is tired of paying for duplicate batch and speed layers and is willing to make the immutable log the center of the architecture.
In practice, this chapter helps decide when a stream-only model truly simplifies the system and when replay, retention, and durable-log requirements cost more than the simplification is worth.
In interviews and architecture reviews, it is especially useful when you need to explain why the batch layer was removed, which complexity disappeared, and which new demands on log storage, backfill, and materialized views appeared instead.
Practical value of this chapter
Design in practice
Helps design stream-only systems without duplicating batch and stream logic.
Decision quality
Guides Kappa adoption where replay and near-real-time behavior are primary.
Interview articulation
Supports explanation of backfill, bootstrap, and historical recomputation in Kappa.
Risk and trade-offs
Highlights retention, storage, and durable-log requirements of the model.
Related book
Big Data (Nathan Marz)
The basic formulation of Lambda Architecture and the context from which Kappa grew.
Kappa Architecture is a stream-first approach to data processing, in which the main computational circuit is the same: events are written to an immutable log, and all views are built by stream processing and materialized into target storage.
Why Kappa came to be
One circuit instead of two
Lambda requires support for two computation paths (batch and speed), which complicates development and operation.
Replay as a standard mechanism
Recalculation is performed by replaying events from the log using the same code that serves realtime.
Stream-native platform
Modern streaming platforms allow you to build stable stateful pipelines without a separate batch stack.
Kappa Basic Stream
Kappa does not have a separate batch layer: the same stream pipeline processes both the live stream and the historical replay.
Lambda vs Kappa
| Criterion | Lambda | Kappa |
|---|---|---|
| Compute model | Batch + Speed + Serving layers. | Single stream pipeline + materialized views. |
| Code paths | Two separate calculation circuits (batch and realtime). | One processing circuit for online and replay scenarios. |
| Reprocessing | Often through batch-recompute of the entire dataset. | Replay from immutable log via the same stream pipeline. |
| Latency | Low via speed layer + eventual merge with batch. | Low if the stream processor and state store can handle the load. |
| Operational complexity | Higher due to two stacks and semantics alignment. | Lower in the number of circuits, but higher requirements for the stream stack. |
| Best fit | When batch/ETL and stream worlds are already strong and separated. | When the platform is built around an event log (Kafka/Pulsar). |
How Kappa is implemented in practice
- Make event log the source of truth: immutable events with versioned schemas.
- Transfer key materialized views to the stream processing path.
- Add a replay/backfill pipeline: replaying events should be a standard operation.
- Separate stateful processing from serving APIs through clear data contracts.
- Fix the SLA for late events, ordering and exactly-once/at-least-once semantics.
When to choose and what to look for
Kappa is suitable if
- Basic domain data is already generated as events.
- We need a single logic for realtime and historical recalculation.
- The team is ready to exploit the stream-first stack and stateful processors.
Risk areas
- Poor quality event schemes and lack of schema governance.
- Too heavy join/window calculations without state size control.
- Underestimation of the cost of replay: CPU, storage I/O, backpressure.
Related chapters
- Big Data: Principles and best practices of scalable realtime data systems (short summary) - Origins of Kappa through the Lambda comparison and stream-first platform design context.
- Streaming Data (short summary) - Hands-on stream-processing practices: delivery semantics, windows, stateful processing, and operational limits.
- Kafka: The Definitive Guide (short summary) - Technology foundation for Kappa with immutable logs, partitioning, and replay as a first-class mechanism.
- Event-driven architecture: Event Sourcing, CQRS, Saga - Architectural integration layer where Kappa naturally aligns with event log and workflow patterns.
- Data pipeline / ETL / ELT architecture - How to embed Kappa pipelines into end-to-end data platforms with orchestration and quality controls.
- Distributed message queue - Queue design case covering ordering, durability, and consumer scaling under high load.
- Designing Data-Intensive Applications (short summary) - Core theory on stream/table duality, replication, and distributed data processing trade-offs.
- Enterprise Integration Patterns (short summary) - Pattern language for robust service integration in stream-first architecture and event workflows.
- Data Mesh in Action (short summary) - Organizational view on scaling Kappa-like platforms through domain-owned data products.
- Google Global Network: evolution and architecture principles for the AI era - Network context for cross-region stream processing and latency-sensitive workloads.
