Distributed Systems, 4th Edition (short summary)

Tanenbaum and van Steen's classic matters not because of nostalgia, but because it brings you back to fundamentals where managed services can hide core ideas too easily.

In real engineering work, it helps separate enduring principles of coordination, consistency, and fault tolerance from platform-specific details.

In interviews and architecture discussions, it is especially useful when you need to show that you understand not only modern tools, but also the limits of classic models.

Practical value of this chapter

Design in practice

Provides the theoretical foundation for deliberate architecture-model selection.

Decision quality

Helps separate textbook assumptions from real operational constraints.

Interview articulation

Strengthens technical depth when explaining distributed-system mechanisms.

Risk and trade-offs

Highlights where classic models need adaptation for modern workloads and operational limits.

Detailed analysis

Code of Architecture

Chapter-by-chapter companion notes from Alexander and the Code of Architecture club.

Read the analysis

Distributed Systems, 4th Edition

Authors: Andrew S. Tanenbaum, Maarten van Steen
Publisher: distributed-systems.net, 2023
Length: ~1000 pages

Tanenbaum and van Steen's foundational textbook: distribution transparency, architectures, communication, coordination, replication, fault tolerance, and security.

Original

Translated

Chapter 1: Introduction

Definition of a distributed system

"A collection of autonomous computing elements that appears to the user as a single coherent system"

Transparency

Hiding distribution complexity from the user

Openness

Standard interfaces and protocols

Scalability

Growth without unacceptable performance degradation

8 types of transparency

Access

Hiding how a resource is accessed

Location

Hiding physical location

Migration

Hiding resource movement

Relocation

Hiding movement while working

Replication

Hiding the existence of copies

Concurrency

Hiding concurrent sharing

Failure

Hiding failures and recovery

Persistence

Hiding data storage

Detailed analysis

Code of Architecture

Companion notes on architecture styles, system organizations, and middleware.

Read the analysis

Chapter 2: Architectures

Architectural styles

Layered

Vertical organization of components

Object-based

Distributed objects with RMI

SOA / Microservices

Service-oriented architecture

Publish-Subscribe

Event-driven model

System organizations

Centralized

Client-server model

Decentralized (P2P)

Peer nodes without center

Hybrid

Combination of approaches (CDN, BitTorrent)

Role of Middleware

Middleware sits between the OS and applications and helps the system behave coherently. Examples include CORBA, RMI, message brokers, and web services. It hides platform heterogeneity and exposes a unified API.

Detailed analysis

Code of Architecture

Companion notes on execution models, threads, virtualization, and code migration.

Read the analysis

Chapter 3: Processes

Threads

User-level vs Kernel-level threads
Multi-threaded servers
Thread pools
reactor/proactor model

Virtualization

Virtual machines
Containers
Resource isolation
Process migration

Clients and servers

Stateless and stateful servers
Server clustering
Code migration
Mobile agents

Detailed analysis

Code of Architecture

Companion notes on remote calls, queues, multicast, and epidemic dissemination.

Read the analysis

Chapter 4: Communications

RPC (Remote Procedure Call)

Synchronous remote procedure calls with local call semantics

Call semantics

at-least-once, at-most-once, exactly-once

Stubs

Client and server proxies for marshalling

Message Queues

Asynchronous communication via message brokers

AMQPRabbitMQKafkaWebSphere MQ

Multicast

Delivering messages to a group of recipients

IP Multicast

Network level, best-effort

Application-level Multicast

Overlay networks, delivery guarantees

Gossip protocols

Each node periodically forwards updates to random neighbors — no central coordinator and no exact membership list. The price of that cheapness is probabilistic guarantees: the system converges eventually, but the moment of convergence is never named up front.

Detailed analysis

Code of Architecture

Companion notes on time, mutual exclusion, leader election, and agreement between nodes.

Read the analysis

Chapter 5: Coordination

Clock synchronization

Physical clock

NTP (Network Time Protocol)
GPS synchronization
Clock drift and correction

Logical clock

Lamport's algorithm (happens-before)
Vector clock (causality)
Hybrid Logical Clocks

Coordination algorithms

Mutual exclusion

Centralized algorithm
Distributed algorithm (Ricart-Agrawala)
Token ring algorithm

Leader election

Bully Algorithm — selecting the node with the highest ID
Ring Algorithm — ring traversal
Raft — election through terms and votes

Detailed analysis

Code of Architecture

Companion notes on flat, structured, and attribute-based names.

Read the analysis

Chapter 6: Naming

Flat names

Identifiers without structure

Broadcasting:ARP, DHCP

Forwarding:Pointer chains

DHT:Chord, Pastry

Structured names

Hierarchical namespaces

DNS:Domain names

File systems:File paths

X.500:Directory services

Attribute names

Search by attributes

LDAP:Directory queries

RDF:Semantic web

Service Discovery:Consul, etcd

Detailed analysis

Code of Architecture

Companion notes on consistency models, client guarantees, and replication protocols.

Read the analysis

Chapter 7: Consistency and Replication

Data-centric models

Strict Consistency

Absolute order of all operations

Sequential Consistency

Everyone sees the same order

Causal Consistency

Preservation of cause-and-effect relationships

Eventual Consistency

Consistency over time

Client-centric models

Read Your Writes

The client sees its own writes

Monotonic Reads

Reads move forward monotonically

Monotonic Writes

Write order is preserved

Writes Follow Reads

Write after reading is consistent

Consistency Protocols

Primary-based

Remote-write, Local-write protocols

Replicated-write

Active replication, Quorum-based

Cache coherence

Write-through, Write-back protocols

Video analysis

Code of Architecture

Video companion on failure models, recovery, and fault-tolerant process groups.

Watch video

Chapter 8: Fault Tolerance

Failure Models

Crash failure

The server suddenly stops

Omission failure

Lost requests or responses

Timing failure

Violation of timing bounds

Byzantine failure

Arbitrary (malicious) behavior

Consensus

Paxos

Classical Lamport consensus algorithm. Phases: Prepare → Promise → Accept → Accepted.

Two-Phase Commit (2PC)

Atomic commit protocol for distributed transactions. Coordinator → Prepare → Vote → Commit/Abort.

Process resilience

Replication

Active and Passive replication for high availability

Process groups

Virtual synchrony for consistent state

Recovery

Checkpointing and message logging

Detailed analysis

Code of Architecture

Companion notes on cryptographic foundations, secure channels, access control, and key management.

Read the analysis

Chapter 9: Security

Cryptographic basics

Symmetric encryption

AES, one key for encryption/decryption

Asymmetric encryption

RSA, public/private keys

Hash functions

SHA-256 integrity check

Protecting distributed systems

Secure channels

TLS/SSL, mutual authentication, perfect forward secrecy

Access control

ACL, capabilities, role-based access control (RBAC)

Key management

PKI, Certificate Authorities, key distribution

Results and recommendations

Related book

Designing Data-Intensive Applications, 2nd Edition (DDIA)

Practical continuation: how to apply the foundational ideas in real data systems.

Читать обзор

Strengths

Fundamental coverage of distributed-systems theory
Algorithmic approach with proofs
Examples in Python make the material more accessible
Deep coverage of consistency and coordination
Strong chapter on security

Who is it suitable for?

Students studying distributed systems
Engineers who want the theoretical foundations
Readers who want a deeper understanding of consensus algorithms
Staff+ Engineer interview preparation
Researchers and academic practitioners

Verdict: Tanenbaum and van Steen's book is a foundational textbook for understanding distributed systems through models, algorithms, and network limits. Unlike how-to guides, it explains why systems behave the way they do. It pairs well with DDIA: Tanenbaum gives the theoretical foundation, while DDIA shows the engineering application.

Related chapters

Why distributed systems and consistency matter - Section entry map for invariants, partial failures, and consistency boundaries before diving into the textbook.
CAP theorem - The consistency-versus-availability choice under network partition, explained at system level in the book.
PACELC theorem - CAP extended to normal operation: how latency and consistency compete even outside outages.
Consensus: Paxos and Raft - Practical continuation of coordination, fault-tolerance, and agreement patterns from the book.
Leslie Lamport: causality, Paxos, and engineering thinking - Historical and conceptual context behind causality, logical time, and consensus algorithms.
Clock synchronization in distributed systems - Applied view on physical/logical clocks and why time semantics affect protocol correctness.
Jepsen and consistency models - Validation of theoretical consistency guarantees against real-world failures and anomalies.
Testing distributed systems - How to verify fault tolerance and algorithmic correctness under realistic failure scenarios.
Designing Data-Intensive Applications, 2nd Edition (short summary) - Engineering-focused companion on applying distributed-systems theory in modern data-intensive architecture.
Multi-region and global systems - Applying distributed-systems theory to cross-region replication, latency budgets, and regional resilience.

Where to find the book

Original

distributed-systems.net

Distributed Systems, 4th Edition

Translated

dmkpress.com

Распределённые системы