System Design Space
Knowledge graphSettings

Updated: May 1, 2026 at 6:48 PM

Kafka: The Definitive Guide, 2nd Edition (short summary)

medium

Kafka matters not because it is a famous broker, but because the append-only log changes how services integrate, how streaming is built, and how data can be replayed.

In real engineering work, this book helps design partitioning, retention policy, consumer groups, delivery guarantees, and lag control as parts of one event flow rather than a pile of unrelated settings.

In interviews, reviews, and architecture conversations, it is especially useful when you need to show how per-partition ordering, lag spikes, rebalancing, and storage growth affect whole-system reliability, not just the messaging layer.

Practical value of this chapter

Design in practice

Provides a practical framework for Kafka as an event-flow foundation at scale.

Decision quality

Improves partitioning, retention-policy, and consumer-group choices for the workload.

Interview articulation

Helps explain delivery guarantees, replay, and DLQ strategy in production terms.

Risk and trade-offs

Surfaces ordering, consumer-lag spike, and storage-growth risks.

Source

Post in Book Cube

Original review by Alexander Polomodov

Read post

Kafka: The Definitive Guide, 2nd Edition

Authors: Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty
Publisher: O'Reilly Media, Inc.
Length: 485 pages

Practical guide to Kafka as a broker and partitioned log: producers, consumer groups, replication, delivery guarantees, Kafka Connect, Kafka Streams, and cluster operations.

Original
Translated

Kafka is useful not just as a message queue, but as a partitioned log: producers write records to topics, while consumer groups read partitions in parallel and keep track of their committed offsets.

The book is most valuable when it connects delivery semantics, replication, rebalancing, consumer lag, and stream processing into one operational model. That is where Kafka stops being just a broker and becomes a data platform.

Book editions

1st Edition

Fall 2017: 11 chapters covering Kafka fundamentals, producers, consumers, administration, and stream processing.

2nd Edition

Late 2021: expanded edition with dedicated chapters on programmatic cluster management, transactions, security, and cross-cluster replication.

Core Kafka concepts

Messages and record batches

A record carries a key, value, headers, and timestamp. Records are grouped into batches to reduce network and disk overhead.

Topics and partitions

A topic defines a logical stream of records, while partitions split that stream into independent ordered logs that can scale horizontally.

Producers

Clients that publish records to Kafka, choose topics and partition keys, and configure write acknowledgements.

Consumers

Clients that read records from partitions. Consumer groups divide partitions across members and scale processing.

Related topic

Designing Data-Intensive Applications, 2nd Edition

Chapter 11 takes a deep dive into stream processing.

Читать обзор

We recommend

Streaming Data

Architecture of streaming systems: from data collection to data consumption

Читать обзор

Book structure (2nd Edition, 14 chapters)

1

Meet Kafka

Introduction to publish/subscribe messaging, Kafka's origins at LinkedIn, and the core vocabulary: messages, batches, schemas, topics, partitions, producers, consumers, and brokers.

2

Managing Apache Kafka ProgrammaticallyNEW

AdminClient API as an asynchronous interface for managing topics, configurations, consumer groups, and cluster metadata, plus leader election and replica reassignment.

3

Installing Kafka

Broker installation and configuration, server sizing, and ZooKeeper or KRaft setup. The 2nd edition adds more emphasis on cloud deployments.

4

Kafka Producers

Producer configuration, serialization with Avro or JSON, partitioners, headers, interceptors, quotas, and write-throughput control.

5

Kafka Consumers

Consumer groups, partition assignment, offset management (auto-commit, sync, async), rebalance listeners, and standalone consumers.

6

TransactionsNEW

Exactly-once guarantees, the transactional producer API, read_committed isolation, idempotency, and atomic writes across multiple partitions.

7

Kafka Internals(under the hood)

Cluster membership, the controller role, replication, ISR, request processing, physical storage, log segments, and indexes.

8

Reliable Data Delivery

Delivery guarantees: at-most-once, at-least-once, and exactly-once. Producer acknowledgements, retries, consumer behavior, and broker settings that determine reliability in practice.

9

Securing KafkaNEW

SSL/TLS encryption, SASL authentication (GSSAPI, PLAIN, SCRAM, OAUTHBEARER), ACL-based authorization, auditing, and operational security.

10

Building Data Pipelines

Kafka Connect source and sink connectors, standalone and distributed modes, transformations, converters, and dead letter queues.

11

Cross-Cluster Data Mirroring

MirrorMaker 2.0, multi-datacenter architectures (Active-Active, Active-Passive), and replication of topics and consumer offsets between clusters.

12

Administering Kafka

Topic operations, consumer-group management, partition reassignment, production configuration, and day-to-day cluster operations.

13

Monitoring Kafka

JMX metrics and the key broker, producer, and consumer signals: under-replicated partitions, consumer lag, and monitoring tools.

14

Stream Processing

Kafka Streams API: stateless and stateful operations, windowing, stream joins, KTables and KStreams, exactly-once processing, and testing.

New in 2nd edition

  • AdminClient API — programmatic cluster management
  • Transactions — exactly-once guarantees and atomic writes
  • Securing Kafka — SSL/TLS, SASL, ACLs, and operational security
  • MirrorMaker 2.0 - improved cross-cluster replication
  • KRaft — coverage of the ZooKeeper-free control-plane mode

Message delivery semantics

At-most-once

The consumer commits progress before processing or without reliable retry. Data can be lost, but latency stays low; this can be acceptable for some metrics and technical logs.

At-least-once

Kafka retries delivery when progress has not been committed. Duplicates are possible, so consumers must make their side effects idempotent.

Exactly-once

Kafka limits duplicate side effects through the idempotent producer and transactional API. This is a processing guarantee, not magic removal of every retry.

Kafka cluster architecture

Hover over a component for details or press the button

Producers
App 1
App 2
App 3
Kafka cluster
B1
B2
B3
Topic: orders
P0
P1
P2
Consumer group
C1
← P0
C2
← P1, P2
ZooKeeper / KRaft controller
Metadata and coordination
Producers / consumers
Kafka cluster
Partitions
Brokers

Partition replication

Leader accepts writes, followers replicate

Topic: orders(replication factor = 3)
Broker 1
Leader
Partitions:
P0 (Leader)
P1 (Follower)
P2 (Follower)
Broker 2
Follower
Partitions:
P0 (Follower)
P1 (Leader)
P2 (Follower)
Broker 3
Follower
Partitions:
P0 (Follower)
P1 (Follower)
P2 (Leader)
ISR (In-Sync Replicas):
Broker 1
Broker 2
Broker 3
min.insync.replicas = 2
✓ Writes allowed
Leader partition
Follower replica
Active broker
Failed broker

Key takeaways for system design

  • Partitioning is the key to horizontal scaling. The partition key determines both load distribution and ordering boundaries.
  • Replication provides fault tolerance. ISR shows which replicas are synchronized enough to participate in write acknowledgement.
  • Consumer groups scale processing. In one group, active consumers cannot exceed the number of partitions.
  • Retention policy determines how long Kafka keeps records and therefore bounds replay and consumer recovery.
  • Kafka Connect simplifies integration with external systems through source and sink connectors without bespoke application code.

Related chapters

Where to find the book

Enable tracking in Settings