Streaming becomes easier to reason about once you stop seeing it as Kafka plus a consumer and start thinking in collection, queue, and analysis tiers, event time, and stream state as one system.
In engineering practice, this book helps design stream-first pipelines with explicit ordering, stateful processing, late events, and materialization boundaries so the architecture survives real replay and backfill behavior.
In interviews and architecture reviews, it is especially useful when you need to show the cost of stream processing: reprocessing overhead, the correctness impact of late events, and the way backfill pressures SLA.
Practical value of this chapter
Design in practice
Supports stream-first pipeline design with event-time, ordering, and stateful processing.
Decision quality
Improves batch-vs-stream and materialization-boundary decisions.
Interview articulation
Enables clear discussion of offsets, windowing, and exactly-once limitations.
Risk and trade-offs
Focuses on late events, reprocessing cost, and backfill impact on SLOs.
Source
Book Review
Original review by Alexander Polomodov on tellmeabout.tech
Streaming Data: Understanding the Real-Time Pipeline
Authors: Andrew Psaltis
Publisher: Manning Publications, 2017 (Russian edition: DMK Press, 2018)
Length: 216 pages
Andrew Psaltis about stream processing: Collection/Queue/Analysis tiers, delivery semantics, data windows, stream algorithms.
Streaming System Architecture
The book examines the entire pipeline of working with data from the source to the final consumer. The reference architecture includes the following links:
Related topic
Kafka: The Definitive Guide
Deep dive into one of the key stream processing technologies
Collecting data from sources
Buffering and Routing
Flow Processing and Analysis
Memory storage
Access to processed data
Data consumers
Collection Tier - Data collection link
Author's recommendation
Enterprise Integration Patterns
Classic book on integration patterns referenced by the author
The chapter looks at interaction patterns for data collection:
Fault tolerance
The author considers two approaches: control points And logging. For streaming systems, logging is more applicable:
Receiver-based message logging
Sender-based message logging
Hybrid message logging
Message Queuing Tier
The purpose of this link is to break the connection between data collection and analysis. Key concepts: producer, broker and consumer.
Message delivery semantics
The message is delivered no more than once, it may be lost
Guaranteed delivery, duplicates possible
Exactly one delivery, the most difficult implementation
Analysis Tier - Streaming Data Analysis
Related chapter
DDIA: Stream Processing
Chapter 11 of DDIA covers the topic of stream processing in depth.
The most meaningful part of the book. Starts with a concept in-flight dataand inversions of the traditional data management model.
Processing technologies
Common Components
- Application Driver
- Streaming Manager
- Stream Processor
- Data Sources
Key features when choosing a system
Message Delivery
Delivery semantics
State Management
State management
Fault Tolerance
Fault tolerance
Limitations of algorithms on a thread
- •Single pass — one chance to process each message
- •Concept drift — model properties can change with new data
- •Limited resources — there is not always enough processing power
- •Time — difference between flow time and event time
Data windows and summary
Sliding Window
Sliding window - overlapping intervals for continuous analysis
Tumbling Window
Jumping window - non-overlapping fixed-size intervals
Methods for summarizing data on a stream
Random sampling
Representative part of the stream
LogLog / MinCount
Counting unique elements
Count-Min Sketch
Element occurrence frequency
Bloom filter
Question about element occurrence
Data storage
Long-term Storage
- •Direct recording - reduces flow rate
- •Indirect recording - ETL with batch loading
In-Memory Storage
Caching Strategies
Read-through
Read-through
Refresh-ahead
Leading update
Write-through
Write-through
Write-around
Bypass entry
Write-behind
Delayed recording
Data Access Tier - Data Access
Interaction Patterns
- Data Sync
- RPC / RMI
- Simple Messaging
- Publish-Subscribe
Delivery protocols
Protocol Selection Factors
Consumer Tier - Data Consumers
Information applications
Dashboards, reports, visualization
Integration with third party systems
API, webhooks, synchronization
Stream processing
Downstream processing
Key questions for a streaming client
- 1.How can a client know that he is not reading fast enough?
- 2.What will happen if he doesn't know about it?
- 3.How to scale the client so that it keeps up with the flow?
Results
“Brevity is the sister of talent” - A.P. Chekhov
The book is useful and short (about 200 pages), which makes it even better. Conceptually, it has not become outdated in the years since its release - the architectural patterns of stream processing remain relevant.
Related chapters
- Kafka: The Definitive Guide (short summary) - Hands-on focus on brokers, partitions, and delivery semantics as a foundation for stream-first architecture.
- Kappa architecture: a stream-first alternative to Lambda - Single processing path for realtime and replay as a direct continuation of the book's stream-processing model.
- Data pipeline / ETL / ELT architecture - How to embed streaming workloads into an end-to-end data platform and operating model.
- Event-driven architecture: Event Sourcing, CQRS, Saga - Architectural context where event streams become the default integration mechanism across services.
- Distributed message queue - System design case focused on throughput, ordering, durability, and peak-load behavior.
- Designing Data-Intensive Applications (short summary) - Core foundation for stream processing, stateful computation, and consistency trade-offs in data-intensive systems.
- Enterprise Integration Patterns (short summary) - Pattern language for designing reliable event and stream interactions across heterogeneous systems.
- Big Data: Principles and best practices of scalable realtime data systems (short summary) - Strategic perspective on realtime data-system architecture and platform-level evolution.
- Data Mesh in Action (short summary) - Organizational layer for decomposing streaming platforms into product-oriented domains and federated governance.
- Google Global Network: evolution and architecture principles for the AI era - Network foundation for high-throughput streams: latency budgets, cross-region transport, and WAN resilience.
