Big Data still matters not because it once popularized Lambda Architecture, but because it remains a sharp way to see the cost of separating batch, serving, and speed layers.
In real engineering practice, the book helps show where immutable data, approximate algorithms, and separate compute paths are justified and where the architecture starts losing to its own complexity.
In interviews and architecture discussions, it is especially useful when you need to speak honestly about the point where latency, correctness, and complexity stop coexisting peacefully in one design.
Practical value of this chapter
Design in practice
Builds an end-to-end view of batch, stream, and serving layers for high-volume analytics.
Decision quality
Improves architecture-style choice by latency, cost, and correctness constraints.
Interview articulation
Adds concrete criteria for Lambda, Kappa, or hybrid decisions in interview answers.
Risk and trade-offs
Shows where data architecture degrades under complexity growth and data drift.
Source
Book Review
Original review by Alexander Polomodov on tellmeabout.tech
Big Data: Principles and Best Practices of Scalable Realtime Data Systems
Authors: Nathan Marz, James Warren
Publisher: Manning Publications
Length: 328 pages
Nathan Marz about Lambda Architecture: batch/serving/speed layers, data immutability, HyperLogLog and practical examples.
Lambda Architecture
Related topic
DDIA: Batch & Stream Processing
Chapters 10-11 of DDIA cover batch and stream processing in detail.
The book is dedicated Lambda Architecture — an architectural pattern for big data processing systems, consisting of three levels:
Batch Layer
Storing master data in the format of immutable events (append-only). Calculation of arbitrary representations on a complete data set.
Serving Layer
Fast queries on precomputed views. Can be immutable between batch layer recalculations.
Speed Layer
Data flow processing for updating between batch layer recalculations. Approximate aggregates in real time.
Lambda Architecture Map
master dataset + batch views + realtime viewsLambda Architecture объединяет точность batch-пересчётов и low-latency потоковый слой через единый serving контур.
"The Lambda Architecture provides a general-purpose approach to implementing an arbitrary function on an arbitrary dataset and having the function return its results with low latency"
— Nathan Marz
Desired properties of Big Data System
The authors identify the key properties that a big data processing system should have:
Horizontal scaling
Ability to add nodes to increase power
Fault tolerance
Resilience to hardware failures without data loss
Bug fixes
Ability to correct human errors
Low Latency
Quick responses to user requests
Custom requests
Supports any type of data calculations
Minimum difficulty
Simplicity of operational support of the system
Book structure
We recommend
Streaming Data
A modern view of the architecture of streaming systems
The book is divided into parts corresponding to the levels of Lambda Architecture:
Part 1: Batch Layer
Data model, storing master data, computing views on a complete data set.
Part 2: Serving Layer
Indexing and serving precomputed views for fast queries.
Part 3: Speed Layer
Real-time data processing, batch layer delay compensation.
Practical examples
The authors do not limit themselves to theory, but also analyze typical tasks for big data systems:
URL Page Views
Counting website URL views over time
Unique Visitors
Calculating the number of unique users with HyperLogLog
Bounce Rate
Counting web application failures across the entire domain
Technology stack examples
Storage
HDFS
Batch
Hadoop
Serving
ElephantDB
Speed
Storm
* Technologies from the 2015 book. Modern alternatives: Spark, Flink, Kafka Streams
Related chapters
- Designing Data-Intensive Applications (short summary) - Foundational distributed-data theory that complements the Lambda model and clarifies core trade-offs.
- Streaming Data (short summary) - Hands-on stream-processing practices as an extension of Lambda's speed-layer concepts.
- Kafka: The Definitive Guide (short summary) - Event-log platform foundations for ingestion and streaming backbones in big data systems.
- Kappa architecture: a stream-first alternative to Lambda - Evolution of Lambda ideas toward a single stream-first processing path without a separate batch branch.
- Data pipeline / ETL / ELT architecture - Operational perspective on end-to-end pipelines, orchestration strategy, and data quality controls.
- Distributed message queue - Practical queueing case focused on ordering, durability, and throughput under real load.
- Distributed file system (GFS/HDFS) - Storage-layer fundamentals behind Lambda batch processing and distributed file-system architecture.
- Data Mesh in Action (short summary) - Organizational evolution from centralized Lambda-era platforms to domain-oriented data ownership.
- T-Bank data platform overview - Real platform case combining batch and stream processing, lakehouse patterns, and product data thinking.
- Google Global Network: evolution and architecture principles for the AI era - Network context for cross-region transfer and low-latency processing of large-scale data streams.
