System Design Space
Knowledge graphSettings

Updated: March 2, 2026 at 3:42 PM

Kafka: The Definitive Guide (short summary)

mid

Source

Post in Book Cube

Original book review from Alexander Polomodov

Read post

Kafka: The Definitive Guide, 2nd Edition

Authors: Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty
Publisher: O'Reilly Media, Inc.
Length: 485 pages

Distributed stream processing platform: producers, consumers, partitions, replication, delivery semantics and Kafka Streams.

Kafka: The Definitive Guide, 2nd Edition - original coverOriginal
Kafka: The Definitive Guide, 2nd Edition - translated editionTranslated

Book editions

1st Edition

Fall 2017 - 11 chapters covering the basics of Kafka, producers, consumers, administration and stream processing.

2nd Edition

Late 2021 - expanded edition with an emphasis on cloud deployments and new platform capabilities.

Key Concepts of Kafka

Messages and packages

Basic data units in Kafka. Messages are grouped into batches for efficient transmission over the network.

Topics and partitions

Topics are logical message channels divided into partitions for parallel processing and scaling.

Producers

Clients writing messages to Kafka. Kafka is aimed at writers - high write throughput.

Consumers

Clients reading messages from Kafka. Consumer groups provide parallel processing and fault tolerance.

Related topic

Designing Data-Intensive Applications

Chapter 11 takes a deep dive into stream processing.

Читать обзор

We recommend

Streaming Data

Architecture of streaming systems: from data collection to data consumption

Читать обзор

Book structure (2nd Edition - 14 chapters)

1

Meet Kafka

Introduction to publish/subscribe messaging, history of creation on LinkedIn, basic concepts: messages, batches, schemas, topics, partitions, producers, consumers, brokers.

2

Managing Apache Kafka ProgrammaticallyNEW

AdminClient API: asynchronous interface for managing topics, configurations, consumer groups, cluster metadata. Leader election and reassigning replicas.

3

Installing Kafka

Installation and configuration of brokers, selection of hardware, configuration of ZooKeeper/KRaft. The 2nd edition brings more emphasis on cloud deployments.

4

Kafka Producers

Configuration of producers, serialization (Avro, JSON), partitioners, headers, interceptors, quotas and bandwidth management.

5

Kafka Consumers

Consumer groups, partition assignment, offset management (auto-commit, sync, async), rebalance listeners, standalone consumers.

6

TransactionsNEW

Exactly-once semantics, transactional producer API, read_committed isolation, idempotency and atomic writes.

7

Kafka Internals(under the hood)

Cluster membership, controller, replication, ISR, request processing, physical storage, log segments and indexes.

8

Reliable Data Delivery

Delivery guarantees: at-least-once, at-most-once, exactly-once. Configuration of producer (acks, retries), consumer and broker for reliability.

9

Securing KafkaNEW

SSL/TLS encryption, SASL authentication (GSSAPI, PLAIN, SCRAM, OAUTHBEARER), authorization with ACLs, audit and security in production.

10

Building Data Pipelines

Kafka Connect: source and sink connectors, standalone and distributed mode, transformations, converters, dead letter queues.

11

Cross-Cluster Data Mirroring

MirrorMaker 2.0, multi-datacenter architecture (Active-Active, Active-Passive), replication of topics and consumer offsets between clusters.

12

Administering Kafka

Topic operations, consumer group management, partition reassignment, configuration for production, cluster operations.

13

Monitoring Kafka

JMX metrics, critical metrics for brokers, producers and consumers. Under-replicated partitions, lag monitoring, monitoring tools.

14

Stream Processing

Kafka Streams API: stateless and stateful operations, windowing, joins, KTables vs KStreams, exactly-once processing, testing.

New in 2nd edition

  • AdminClient API — software cluster management
  • Transactions — exactly-once semantics and atomic operations
  • Security — SSL/TLS, SASL, ACLs for production
  • MirrorMaker 2.0 - improved cross-cluster replication
  • KRaft — mention of a new mode without ZooKeeper

Message delivery semantics

At-most-once

The message is delivered no more than once. Possible loss of data. Suitable for metrics and logs.

At-least-once

The message is delivered at least once. Duplicates are possible. Standard Kafka mode.

Exactly-once

The message is delivered exactly once. Requires idempotent producer and transactional API.

Kafka cluster architecture

Hover over a component for details or press the button

Producers
App 1
App 2
App 3
Kafka Cluster
B1
B2
B3
Topic: orders
P0
P1
P2
Consumer Group
C1
← P0
C2
← P1, P2
ZooKeeper / KRaft Controller
Metadata & Coordination
Producer/Consumer
Kafka Cluster
Partitions
Brokers

Partition replication

Leader accepts writes, followers replicate

Topic: orders(replication factor = 3)
Broker 1
Leader
Partitions:
P0 (Leader)
P1 (Follower)
P2 (Follower)
Broker 2
Follower
Partitions:
P0 (Follower)
P1 (Leader)
P2 (Follower)
Broker 3
Follower
Partitions:
P0 (Follower)
P1 (Follower)
P2 (Leader)
ISR (In-Sync Replicas):
Broker 1
Broker 2
Broker 3
min.insync.replicas = 2
✓ Writes allowed
Leader partition
Follower replica
Active broker
Failed broker

Key Takeaways for System Design

  • Partitioning is the key to horizontal scaling. The choice of partition key determines the load distribution.
  • Replication provides fault tolerance. ISR (In-Sync Replicas) guarantees consistency.
  • Consumer groups allow scaling of processing. Number of consumers ≤ number of partitions.
  • Retention policy determines how long data is stored. Kafka can act as a log store.
  • Kafka Connect simplifies integration with external systems without writing code (source and sink connectors).

Where to find the book

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov