Big Data still matters not because it once popularized Lambda Architecture, but because it remains a sharp way to see the cost of separating batch, serving, and speed layers.
In real engineering practice, the book helps show where immutable data, approximate algorithms, and separate compute paths are justified and where the architecture starts losing to its own complexity.
In interviews and architecture discussions, it is especially useful when you need to speak honestly about the point where latency, correctness, and complexity stop coexisting peacefully in one design.
Practical value of this chapter
Design in practice
Builds an end-to-end view of batch paths, stream processing, and serving layers for high-volume analytics.
Decision quality
Improves architecture-style choices around latency, recomputation cost, and result correctness.
Interview articulation
Adds concrete criteria for Lambda, Kappa, or hybrid decisions in interview answers.
Risk and trade-offs
Shows where data architecture starts degrading under complexity growth and changing input data.
Source
Book Review
Original review by Alexander Polomodov on tellmeabout.tech
Big Data: Principles and Best Practices of Scalable Realtime Data Systems
Authors: Nathan Marz, James Warren
Publisher: Manning Publications
Length: 328 pages
Nathan Marz on Lambda Architecture: batch, serving, and speed layers, immutable event history, batch and realtime views, HyperLogLog, and the cost of complexity.
Lambda Architecture
Related topic
DDIA: Batch & Stream Processing
DDIA explains batch recomputation, stream processing, and materialized views in depth.
The book explains Lambda Architecture as a way to combine accurate historical recomputation with fast answers over fresh events. The classic design has three layers:
Batch Layer
Stores the master dataset as immutable event history and recomputes accurate views over the full history.
Serving Layer
Indexes precomputed views and serves fast responses while the batch layer prepares the next full recomputation.
Speed Layer
Processes fresh events between batch recomputations and builds approximate aggregates for low-latency answers.
Lambda Architecture map
master dataset + batch views + realtime viewsLambda Architecture combines accurate batch recomputation, a fast speed layer, and a single serving layer for queries.
"The Lambda Architecture provides a general-purpose approach to implementing an arbitrary function on an arbitrary dataset and having the function return its results with low latency"
— Nathan Marz
Desired properties of a big data processing system
The authors identify the key properties that a big data processing system should have:
Horizontal scaling
Capacity grows by adding nodes, not by redesigning the whole system.
Fault tolerance
Resilience to hardware failures without data loss
Human-error recovery
Bad code or bad data can be repaired by recomputing from the original history.
Low latency
User-facing queries can return fresh answers without waiting for a full batch run.
Flexible computation
New views and algorithms can be added over already accumulated data.
Controlled complexity
The team can reason about which path owns accuracy, freshness, and serving.
Book structure
We recommend
Streaming Data
A modern view of the architecture of streaming systems
The book is divided into parts that map to the layers of Lambda Architecture:
Part 1: Batch Layer
Data model, master dataset, and accurate view computation over the full history.
Part 2: Serving Layer
Indexing and serving precomputed views for fast queries.
Part 3: Speed Layer
Fast processing of fresh events and compensation for the delay between batch recomputations.
Practical examples
The authors go beyond theory and walk through representative large-scale data problems:
URL Page Views
Counting page views by URL and time interval.
Unique Visitors
Estimating unique visitors with HyperLogLog.
Bounce Rate
Calculating bounce rate across a site or domain.
Technology stack examples
Storage
HDFS
Batch layer
Hadoop
Serving layer
ElephantDB
Speed layer
Storm
* Technologies from the 2015 book. Modern alternatives include Spark, Flink, and Kafka Streams.
Related chapters
- Designing Data-Intensive Applications, 2nd Edition (short summary) - Foundational distributed-data theory that complements the Lambda model and clarifies core trade-offs.
- Streaming Data (short summary) - Hands-on stream processing and modern operational practice around Lambda's speed-layer ideas.
- Kafka: The Definitive Guide, 2nd Edition (short summary) - Event-log platform foundations for ingestion and streaming backbones in large-scale data systems.
- Kappa Architecture: stream-first alternative to Lambda - Evolution of Lambda ideas toward one stream-first processing path without a separate batch branch.
- Data Pipeline / ETL / ELT Architecture - Operational perspective on data pipelines, orchestration strategy, and data quality controls.
- Distributed message queue - Practical queueing case focused on ordering, durability, and throughput under real load.
- Distributed file system (GFS/HDFS) - Storage-layer fundamentals behind the batch side of Lambda and distributed file-system architecture.
- Data Mesh in Action (short summary) - Organizational evolution from centralized Lambda-era platforms to domain-oriented data ownership.
- T-Bank data platform overview - Real platform case combining batch and stream processing, lakehouse patterns, and product thinking around data.
- Google Global Network: Evolution and Architectural Principles for the AI Era - Network context for cross-region transfer and low-latency processing of large-scale data streams.
