System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 11:59 PM

Streaming Data (short summary)

hard

Source

Book Review

Original review by Alexander Polomodov on tellmeabout.tech

Перейти на сайт

Streaming Data: Understanding the Real-Time Pipeline

Authors: Andrew Psaltis
Publisher: Manning Publications, 2017 (Russian edition: DMK Press, 2018)
Length: 216 pages

Andrew Psaltis about stream processing: Collection/Queue/Analysis tiers, delivery semantics, data windows, stream algorithms.

Streaming Data: Understanding the Real-Time Pipeline - original coverOriginal
Streaming Data: Understanding the Real-Time Pipeline - translated editionTranslated

Streaming System Architecture

The book examines the entire pipeline of working with data from the source to the final consumer. The reference architecture includes the following links:

Related topic

Kafka: The Definitive Guide

Deep dive into one of the key stream processing technologies

Читать обзор
Collection Tier

Collecting data from sources

Message Queue Tier

Buffering and Routing

Analysis Tier

Flow Processing and Analysis

In-Memory Store

Memory storage

Data Access Tier

Access to processed data

Consumer Tier

Data consumers

Collection Tier - Data collection link

Author's recommendation

Enterprise Integration Patterns

Classic book on integration patterns referenced by the author

Читать обзор

The chapter looks at interaction patterns for data collection:

Request/Response — Classic request-response
Request/Acknowledge — Request with confirmation of receipt
Publish/Subscribe — Publisher-subscriber
One-way — One-way interaction
Stream — Continuous Data Flow

Fault tolerance

The author considers two approaches: control points And logging. For streaming systems, logging is more applicable:

RBML

Receiver-based message logging

SBML

Sender-based message logging

HML

Hybrid message logging

Message Queuing Tier

The purpose of this link is to break the connection between data collection and analysis. Key concepts: producer, broker and consumer.

Message delivery semantics

At most once

The message is delivered no more than once, it may be lost

Low
At least once

Guaranteed delivery, duplicates possible

Average
Exactly once

Exactly one delivery, the most difficult implementation

High

Analysis Tier - Streaming Data Analysis

Related chapter

DDIA: Stream Processing

Chapter 11 of DDIA covers the topic of stream processing in depth.

Читать обзор

The most meaningful part of the book. Starts with a concept in-flight dataand inversions of the traditional data management model.

Processing technologies

Spark StreamingStormFlinkSamza

Common Components

  • Application Driver
  • Streaming Manager
  • Stream Processor
  • Data Sources

Key features when choosing a system

📨

Message Delivery

Delivery semantics

💾

State Management

State management

🛡️

Fault Tolerance

Fault tolerance

Limitations of algorithms on a thread

  • Single pass — one chance to process each message
  • Concept drift — model properties can change with new data
  • Limited resources — there is not always enough processing power
  • Time — difference between flow time and event time

Data windows and summary

Sliding Window

Sliding window - overlapping intervals for continuous analysis

Tumbling Window

Jumping window - non-overlapping fixed-size intervals

Methods for summarizing data on a stream

Random sampling

Representative part of the stream

LogLog / MinCount

Counting unique elements

Count-Min Sketch

Element occurrence frequency

Bloom filter

Question about element occurrence

Data storage

Long-term Storage

  • Direct recording - reduces flow rate
  • Indirect recording - ETL with batch loading

In-Memory Storage

SQLiteRocksDBLevelDBMemcachedRedisMemSQLAerospikeApache Ignite

Caching Strategies

Read-through

Read-through

Refresh-ahead

Leading update

Write-through

Write-through

Write-around

Bypass entry

Write-behind

Delayed recording

Data Access Tier - Data Access

Interaction Patterns

  • Data Sync
  • RPC / RMI
  • Simple Messaging
  • Publish-Subscribe

Delivery protocols

WebhooksLong PollSSEWebSocket

Protocol Selection Factors

Update Frequency
Direction
Latency
Efficiency
Fault Tolerance

Consumer Tier - Data Consumers

📊

Information applications

Dashboards, reports, visualization

🔗

Integration with third party systems

API, webhooks, synchronization

Stream processing

Downstream processing

Key questions for a streaming client

  • 1.How can a client know that he is not reading fast enough?
  • 2.What will happen if he doesn't know about it?
  • 3.How to scale the client so that it keeps up with the flow?

Results

“Brevity is the sister of talent” - A.P. Chekhov

The book is useful and short (about 200 pages), which makes it even better. Conceptually, it has not become outdated in the years since its release - the architectural patterns of stream processing remain relevant.

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov