System Design Space
Knowledge graphSettings

Updated: March 25, 2026 at 3:00 AM

Big Data (short summary)

hard

Big Data still matters not because it once popularized Lambda Architecture, but because it remains a sharp way to see the cost of separating batch, serving, and speed layers.

In real engineering practice, the book helps show where immutable data, approximate algorithms, and separate compute paths are justified and where the architecture starts losing to its own complexity.

In interviews and architecture discussions, it is especially useful when you need to speak honestly about the point where latency, correctness, and complexity stop coexisting peacefully in one design.

Practical value of this chapter

Design in practice

Builds an end-to-end view of batch, stream, and serving layers for high-volume analytics.

Decision quality

Improves architecture-style choice by latency, cost, and correctness constraints.

Interview articulation

Adds concrete criteria for Lambda, Kappa, or hybrid decisions in interview answers.

Risk and trade-offs

Shows where data architecture degrades under complexity growth and data drift.

Source

Book Review

Original review by Alexander Polomodov on tellmeabout.tech

Перейти на сайт

Big Data: Principles and Best Practices of Scalable Realtime Data Systems

Authors: Nathan Marz, James Warren
Publisher: Manning Publications
Length: 328 pages

Nathan Marz about Lambda Architecture: batch/serving/speed layers, data immutability, HyperLogLog and practical examples.

Original

Lambda Architecture

Related topic

DDIA: Batch & Stream Processing

Chapters 10-11 of DDIA cover batch and stream processing in detail.

Читать обзор

The book is dedicated Lambda Architecture — an architectural pattern for big data processing systems, consisting of three levels:

Batch Layer

Storing master data in the format of immutable events (append-only). Calculation of arbitrary representations on a complete data set.

Serving Layer

Fast queries on precomputed views. Can be immutable between batch layer recalculations.

Speed Layer

Data flow processing for updating between batch layer recalculations. Approximate aggregates in real time.

Lambda Architecture Map

master dataset + batch views + realtime views
Raw Event Log
immutable append-only source
Batch Layer -> Batch Views
точные агрегаты на полном датасете
Speed Layer -> Realtime Views
низкая задержка между batch-пересчётами
Serving Layer -> Query API
объединение batch + speed представлений

Lambda Architecture объединяет точность batch-пересчётов и low-latency потоковый слой через единый serving контур.

"The Lambda Architecture provides a general-purpose approach to implementing an arbitrary function on an arbitrary dataset and having the function return its results with low latency"

— Nathan Marz

Desired properties of Big Data System

The authors identify the key properties that a big data processing system should have:

Horizontal scaling

Ability to add nodes to increase power

Fault tolerance

Resilience to hardware failures without data loss

Bug fixes

Ability to correct human errors

Low Latency

Quick responses to user requests

Custom requests

Supports any type of data calculations

Minimum difficulty

Simplicity of operational support of the system

Book structure

We recommend

Streaming Data

A modern view of the architecture of streaming systems

Читать обзор

The book is divided into parts corresponding to the levels of Lambda Architecture:

Part 1: Batch Layer

Data model, storing master data, computing views on a complete data set.

Data ModelMaster DatasetBatch ViewsMapReduce

Part 2: Serving Layer

Indexing and serving precomputed views for fast queries.

IndexingBatch Views ServingElephantDB

Part 3: Speed Layer

Real-time data processing, batch layer delay compensation.

Realtime ViewsStream ProcessingApache StormMicro-batching

Practical examples

The authors do not limit themselves to theory, but also analyze typical tasks for big data systems:

📊

URL Page Views

Counting website URL views over time

👥

Unique Visitors

Calculating the number of unique users with HyperLogLog

🚨

Bounce Rate

Counting web application failures across the entire domain

Technology stack examples

Storage

HDFS

Batch

Hadoop

Serving

ElephantDB

Speed

Storm

* Technologies from the 2015 book. Modern alternatives: Spark, Flink, Kafka Streams

Related chapters

Where to find the book

Enable tracking in Settings