Kafka: The Definitive Guide, 2nd Edition (short summary)

Kafka matters not because it is a famous broker, but because the append-only log changes how services integrate, how streaming is built, and how data can be replayed.

In real engineering work, this book helps design partitioning, retention policy, consumer groups, delivery guarantees, and lag control as parts of one event flow rather than a pile of unrelated settings.

In interviews, reviews, and architecture conversations, it is especially useful when you need to show how per-partition ordering, lag spikes, rebalancing, and storage growth affect whole-system reliability, not just the messaging layer.

Practical value of this chapter

Design in practice

Provides a practical framework for Kafka as an event-flow foundation at scale.

Decision quality

Improves partitioning, retention-policy, and consumer-group choices for the workload.

Interview articulation

Helps explain delivery guarantees, replay, and DLQ strategy in production terms.

Risk and trade-offs

Surfaces ordering, consumer-lag spike, and storage-growth risks.

Source

Post in Book Cube

Original review by Alexander Polomodov

Read post

Kafka: The Definitive Guide, 2nd Edition

Authors: Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty
Publisher: O'Reilly Media, Inc.
Length: 485 pages

Practical guide to Kafka as a broker and partitioned log: producers, consumer groups, replication, delivery guarantees, Kafka Connect, Kafka Streams, and cluster operations.

Original

Translated

Treating Kafka as a message queue misses the point. Underneath it is a partitioned log: producers append records to topics, while consumer groups read partitions in parallel and hold their own committed offsets. Whether you grasp that difference shapes how you design ordering, reliability, and scale.

From there the book rests on one chain: delivery semantics, replication, rebalancing, consumer lag, and stream processing. Lose the link between them and Kafka stays just a broker; hold it and Kafka becomes a data platform.

Book editions

1st Edition

Fall 2017: 11 chapters covering Kafka fundamentals, producers, consumers, administration, and stream processing.

2nd Edition

Late 2021: expanded edition with dedicated chapters on programmatic cluster management, transactions, security, and cross-cluster replication.

Core Kafka concepts

Messages and record batches

A record carries a key, value, headers, and timestamp. Records are grouped into batches to reduce network and disk overhead.

Topics and partitions

A topic defines a logical stream of records, while partitions split that stream into independent ordered logs that can scale horizontally.

Producers

Clients that publish records to Kafka, choose topics and partition keys, and configure write acknowledgements.

Consumers

Clients that read records from partitions. Consumer groups divide partitions across members and scale processing.

Book structure (2nd Edition, 14 chapters)

Meet Kafka

Introduction to publish/subscribe messaging, Kafka's origins at LinkedIn, and the core vocabulary: messages, batches, schemas, topics, partitions, producers, consumers, and brokers.

Managing Apache Kafka ProgrammaticallyNEW

AdminClient API as an asynchronous interface for managing topics, configurations, consumer groups, and cluster metadata, plus leader election and replica reassignment.

Installing Kafka

Broker installation and configuration, server sizing, and ZooKeeper or KRaft setup. The 2nd edition adds more emphasis on cloud deployments.

Kafka Producers

Producer configuration, serialization with Avro or JSON, partitioners, headers, interceptors, quotas, and write-throughput control.

Kafka Consumers

Consumer groups, partition assignment, offset management (auto-commit, sync, async), rebalance listeners, and standalone consumers.

TransactionsNEW

Exactly-once guarantees, the transactional producer API, read_committed isolation, idempotency, and atomic writes across multiple partitions.

Kafka Internals(under the hood)

Cluster membership, the controller role, replication, ISR, request processing, physical storage, log segments, and indexes.

Reliable Data Delivery

Delivery guarantees: at-most-once, at-least-once, and exactly-once. Producer acknowledgements, retries, consumer behavior, and broker settings that determine reliability in practice.

Securing KafkaNEW

SSL/TLS encryption, SASL authentication (GSSAPI, PLAIN, SCRAM, OAUTHBEARER), ACL-based authorization, auditing, and operational security.

Building Data Pipelines

Kafka Connect source and sink connectors, standalone and distributed modes, transformations, converters, and dead letter queues.

Cross-Cluster Data Mirroring

MirrorMaker 2.0, multi-datacenter architectures (Active-Active, Active-Passive), and replication of topics and consumer offsets between clusters.

Administering Kafka

Topic operations, consumer-group management, partition reassignment, production configuration, and day-to-day cluster operations.

Monitoring Kafka

JMX metrics and the key broker, producer, and consumer signals: under-replicated partitions, consumer lag, and monitoring tools.

Stream Processing

Kafka Streams API: stateless and stateful operations, windowing, stream joins, KTables and KStreams, exactly-once processing, and testing.

New in 2nd edition

▸AdminClient API — programmatic cluster management
▸Transactions — exactly-once guarantees and atomic writes
▸Securing Kafka — SSL/TLS, SASL, ACLs, and operational security
▸MirrorMaker 2.0 - improved cross-cluster replication
▸KRaft — coverage of the ZooKeeper-free control-plane mode

Message delivery semantics

At-most-once

The consumer commits progress before processing or without reliable retry. Data can be lost, but latency stays low; this can be acceptable for some metrics and technical logs.

At-least-once

Kafka retries delivery when progress has not been committed. Duplicates are possible, so consumers must make their side effects idempotent.

Exactly-once

The idempotent producer and transactional API limit duplicate side effects. The price is more configuration and higher latency, and it is a processing guarantee, not magic removal of every retry.

Kafka cluster architecture

Hover over a component for details or press the button

Producers

App 1

App 2

App 3

Kafka cluster

Topic: orders

Consumer group

← P0

← P1, P2

ZooKeeper / KRaft controller

Metadata and coordination

Producers / consumers

Kafka cluster

Partitions

Brokers

Partition replication

Leader accepts writes, followers replicate

Topic: orders(replication factor = 3)

Broker 1

Leader

Partitions:

P0 (Leader)

P1 (Follower)

P2 (Follower)

Broker 2

Follower

Partitions:

P0 (Follower)

P1 (Leader)

P2 (Follower)

Broker 3

Follower

Partitions:

P0 (Follower)

P1 (Follower)

P2 (Leader)

ISR (In-Sync Replicas):

Broker 1

Broker 2

Broker 3

min.insync.replicas = 2

✓ Writes allowed

Leader partition

Follower replica

Active broker

Failed broker

Key takeaways for system design

▸Partitioning is the key to horizontal scaling. The partition key determines both load distribution and ordering boundaries.
▸Replication provides fault tolerance. ISR shows which replicas are synchronized enough to participate in write acknowledgement.
▸Consumer groups scale processing. In one group, active consumers cannot exceed the number of partitions.
▸Retention policy determines how long Kafka keeps records and therefore bounds replay and consumer recovery.
▸Kafka Connect simplifies integration with external systems through source and sink connectors without bespoke application code.

Related chapters

Streaming Data (short summary) - End-to-end streaming architecture perspective, from event ingestion to consumers and windowed processing.
Designing Data-Intensive Applications, 2nd Edition (short summary) - Foundational model of replication, consistency, and stream processing behind Kafka's design trade-offs.
Distributed message queue - Practical case study on ordering, throughput, durability, and behavior under failure conditions.
Event-driven architecture: Event Sourcing, CQRS, Saga - Architectural context where Kafka is often used as the transport backbone for event-driven workflows.
Kappa Architecture: stream-first alternative to Lambda - Single processing path model where the Kafka log serves as the source of truth for live processing and historical replay.
Data Pipeline / ETL / ELT Architecture - How Kafka fits into production data platforms across ingestion, orchestration, data quality, and operations.
Enterprise Integration Patterns (short summary) - Integration pattern language for designing robust producer/consumer and routing interactions.
Big Data: Principles and best practices of scalable realtime data systems (short summary) - Strategic context for real-time data systems where Kafka frequently becomes a central platform component.
Google Global Network: Evolution and Architectural Principles for the AI Era - Network context for cross-region replication and high-throughput stream transport at global scale.
Google TPU: architecture evolution and impact on ML systems - AI workload context where Kafka-style logs and streams feed data and ML pipelines.

Where to find the book

Original

oreilly.com

Kafka: The Definitive Guide, 2nd Edition

Translated

piter.com

Apache Kafka. Потоковая обработка и анализ данных, 2-е издание