System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 11:59 PM

Big Data (short summary)

hard

Source

Book Review

Original review by Alexander Polomodov on tellmeabout.tech

Перейти на сайт

Big Data: Principles and Best Practices of Scalable Realtime Data Systems

Authors: Nathan Marz, James Warren
Publisher: Manning Publications
Length: 328 pages

Nathan Marz about Lambda Architecture: batch/serving/speed layers, data immutability, HyperLogLog and practical examples.

Big Data: Principles and Best Practices of Scalable Realtime Data Systems - original coverOriginal

Lambda Architecture

Related topic

DDIA: Batch & Stream Processing

Chapters 10-11 of DDIA cover batch and stream processing in detail.

Читать обзор

The book is dedicated Lambda Architecture — an architectural pattern for big data processing systems, consisting of three levels:

Batch Layer

Storing master data in the format of immutable events (append-only). Calculation of arbitrary representations on a complete data set.

Serving Layer

Fast queries on precomputed views. Can be immutable between batch layer recalculations.

Speed Layer

Data flow processing for updating between batch layer recalculations. Approximate aggregates in real time.

Lambda Architecture Map

master dataset + batch views + realtime views
Raw Event Log
immutable append-only source
Batch Layer -> Batch Views
точные агрегаты на полном датасете
Speed Layer -> Realtime Views
низкая задержка между batch-пересчётами
Serving Layer -> Query API
объединение batch + speed представлений

Lambda Architecture объединяет точность batch-пересчётов и low-latency потоковый слой через единый serving контур.

"The Lambda Architecture provides a general-purpose approach to implementing an arbitrary function on an arbitrary dataset and having the function return its results with low latency"

— Nathan Marz

Desired properties of Big Data System

The authors identify the key properties that a big data processing system should have:

Horizontal scaling

Ability to add nodes to increase power

Fault tolerance

Resilience to hardware failures without data loss

Bug fixes

Ability to correct human errors

Low Latency

Quick responses to user requests

Custom requests

Supports any type of data calculations

Minimum difficulty

Simplicity of operational support of the system

Book structure

We recommend

Streaming Data

A modern view of the architecture of streaming systems

Читать обзор

The book is divided into parts corresponding to the levels of Lambda Architecture:

Part 1: Batch Layer

Data model, storing master data, computing views on a complete data set.

Data ModelMaster DatasetBatch ViewsMapReduce

Part 2: Serving Layer

Indexing and serving precomputed views for fast queries.

IndexingBatch Views ServingElephantDB

Part 3: Speed Layer

Real-time data processing, batch layer delay compensation.

Realtime ViewsStream ProcessingApache StormMicro-batching

Practical examples

The authors do not limit themselves to theory, but also analyze typical tasks for big data systems:

📊

URL Page Views

Counting website URL views over time

👥

Unique Visitors

Calculating the number of unique users with HyperLogLog

🚨

Bounce Rate

Counting web application failures across the entire domain

Technology stack examples

Storage

HDFS

Batch

Hadoop

Serving

ElephantDB

Speed

Storm

* Technologies from the 2015 book. Modern alternatives: Spark, Flink, Kafka Streams

Where to find the book

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov