System Design Space
Knowledge graphSettings

Updated: May 2, 2026 at 7:31 AM

Big Data (short summary)

hard

Big Data still matters not because it once popularized Lambda Architecture, but because it remains a sharp way to see the cost of separating batch, serving, and speed layers.

In real engineering practice, the book helps show where immutable data, approximate algorithms, and separate compute paths are justified and where the architecture starts losing to its own complexity.

In interviews and architecture discussions, it is especially useful when you need to speak honestly about the point where latency, correctness, and complexity stop coexisting peacefully in one design.

Practical value of this chapter

Design in practice

Builds an end-to-end view of batch paths, stream processing, and serving layers for high-volume analytics.

Decision quality

Improves architecture-style choices around latency, recomputation cost, and result correctness.

Interview articulation

Adds concrete criteria for Lambda, Kappa, or hybrid decisions in interview answers.

Risk and trade-offs

Shows where data architecture starts degrading under complexity growth and changing input data.

Source

Book Review

Original review by Alexander Polomodov on tellmeabout.tech

Перейти на сайт

Big Data: Principles and Best Practices of Scalable Realtime Data Systems

Authors: Nathan Marz, James Warren
Publisher: Manning Publications
Length: 328 pages

Nathan Marz on Lambda Architecture: batch, serving, and speed layers, immutable event history, batch and realtime views, HyperLogLog, and the cost of complexity.

Original

Lambda Architecture

Related topic

DDIA: Batch & Stream Processing

DDIA explains batch recomputation, stream processing, and materialized views in depth.

Читать обзор

The book explains Lambda Architecture as a way to combine accurate historical recomputation with fast answers over fresh events. The classic design has three layers:

Batch Layer

Stores the master dataset as immutable event history and recomputes accurate views over the full history.

Serving Layer

Indexes precomputed views and serves fast responses while the batch layer prepares the next full recomputation.

Speed Layer

Processes fresh events between batch recomputations and builds approximate aggregates for low-latency answers.

Lambda Architecture map

master dataset + batch views + realtime views
Raw event log
immutable append-only source
Batch layer -> batch views
accurate aggregates over the full dataset
Speed layer -> realtime views
low latency between batch recomputations
Serving layer -> Query API
merge batch and realtime views

Lambda Architecture combines accurate batch recomputation, a fast speed layer, and a single serving layer for queries.

"The Lambda Architecture provides a general-purpose approach to implementing an arbitrary function on an arbitrary dataset and having the function return its results with low latency"

— Nathan Marz

Desired properties of a big data processing system

The authors identify the key properties that a big data processing system should have:

Horizontal scaling

Capacity grows by adding nodes, not by redesigning the whole system.

Fault tolerance

Resilience to hardware failures without data loss

Human-error recovery

Bad code or bad data can be repaired by recomputing from the original history.

Low latency

User-facing queries can return fresh answers without waiting for a full batch run.

Flexible computation

New views and algorithms can be added over already accumulated data.

Controlled complexity

The team can reason about which path owns accuracy, freshness, and serving.

Book structure

We recommend

Streaming Data

A modern view of the architecture of streaming systems

Читать обзор

The book is divided into parts that map to the layers of Lambda Architecture:

Part 1: Batch Layer

Data model, master dataset, and accurate view computation over the full history.

Data ModelMaster DatasetBatch ViewsMapReduce

Part 2: Serving Layer

Indexing and serving precomputed views for fast queries.

IndexingBatch Views ServingElephantDB

Part 3: Speed Layer

Fast processing of fresh events and compensation for the delay between batch recomputations.

Realtime ViewsStream ProcessingApache StormMicro-batching

Practical examples

The authors go beyond theory and walk through representative large-scale data problems:

📊

URL Page Views

Counting page views by URL and time interval.

👥

Unique Visitors

Estimating unique visitors with HyperLogLog.

🚨

Bounce Rate

Calculating bounce rate across a site or domain.

Technology stack examples

Storage

HDFS

Batch layer

Hadoop

Serving layer

ElephantDB

Speed layer

Storm

* Technologies from the 2015 book. Modern alternatives include Spark, Flink, and Kafka Streams.

Related chapters

Where to find the book

Enable tracking in Settings