System Design Space
Knowledge graphSettings

Updated: May 7, 2026 at 6:26 PM

Distributed Systems, 4th Edition (short summary)

expert

Tanenbaum and van Steen's classic matters not because of nostalgia, but because it brings you back to fundamentals where managed services can hide core ideas too easily.

In real engineering work, it helps separate enduring principles of coordination, consistency, and fault tolerance from platform-specific details.

In interviews and architecture discussions, it is especially useful when you need to show that you understand not only modern tools, but also the limits of classic models.

Practical value of this chapter

Design in practice

Provides the theoretical foundation for deliberate architecture-model selection.

Decision quality

Helps separate textbook assumptions from real operational constraints.

Interview articulation

Strengthens technical depth when explaining distributed-system mechanisms.

Risk and trade-offs

Highlights where classic models need adaptation for modern workloads and operational limits.

Detailed analysis

Code of Architecture

Chapter-by-chapter companion notes from Alexander and the Code of Architecture club.

Read the analysis

Distributed Systems, 4th Edition

Authors: Andrew S. Tanenbaum, Maarten van Steen
Publisher: distributed-systems.net, 2023
Length: ~1000 pages

Tanenbaum and van Steen's foundational textbook: distribution transparency, architectures, communication, coordination, replication, fault tolerance, and security.

Original
Translated

Chapter 1: Introduction

Definition of a distributed system

"A collection of autonomous computing elements that appears to the user as a single coherent system"

Transparency

Hiding distribution complexity from the user

Openness

Standard interfaces and protocols

Scalability

Growth without unacceptable performance degradation

8 types of transparency

Access
Hiding how a resource is accessed
Location
Hiding physical location
Migration
Hiding resource movement
Relocation
Hiding movement while working
Replication
Hiding the existence of copies
Concurrency
Hiding concurrent sharing
Failure
Hiding failures and recovery
Persistence
Hiding data storage

Detailed analysis

Code of Architecture

Companion notes on architecture styles, system organizations, and middleware.

Read the analysis

Chapter 2: Architectures

Architectural styles

Layered
Vertical organization of components
Object-based
Distributed objects with RMI
SOA / Microservices
Service-oriented architecture
Publish-Subscribe
Event-driven model

System organizations

Centralized
Client-server model
Decentralized (P2P)
Peer nodes without center
Hybrid
Combination of approaches (CDN, BitTorrent)

Role of Middleware

Middleware sits between the OS and applications and helps the system behave coherently. Examples include CORBA, RMI, message brokers, and web services. It hides platform heterogeneity and exposes a unified API.

Detailed analysis

Code of Architecture

Companion notes on execution models, threads, virtualization, and code migration.

Read the analysis

Chapter 3: Processes

Threads

  • User-level vs Kernel-level threads
  • Multi-threaded servers
  • Thread pools
  • reactor/proactor model

Virtualization

  • Virtual machines
  • Containers
  • Resource isolation
  • Process migration

Clients and servers

  • Stateless and stateful servers
  • Server clustering
  • Code migration
  • Mobile agents

Detailed analysis

Code of Architecture

Companion notes on remote calls, queues, multicast, and epidemic dissemination.

Read the analysis

Chapter 4: Communications

RPC (Remote Procedure Call)

Synchronous remote procedure calls with local call semantics

Call semantics
at-least-once, at-most-once, exactly-once
Stubs
Client and server proxies for marshalling

Message Queues

Asynchronous communication via message brokers

AMQPRabbitMQKafkaWebSphere MQ

Multicast

Delivering messages to a group of recipients

IP Multicast
Network level, best-effort
Application-level Multicast
Overlay networks, delivery guarantees

Gossip protocols

Epidemic dissemination: each node periodically forwards updates to random neighbors. With high probability, the system converges with minimal coordination.

Detailed analysis

Code of Architecture

Companion notes on time, mutual exclusion, leader election, and agreement between nodes.

Read the analysis

Chapter 5: Coordination

Clock synchronization

Physical clock
  • NTP (Network Time Protocol)
  • GPS synchronization
  • Clock drift and correction
Logical clock
  • Lamport's algorithm (happens-before)
  • Vector clock (causality)
  • Hybrid Logical Clocks

Coordination algorithms

Mutual exclusion
  • Centralized algorithm
  • Distributed algorithm (Ricart-Agrawala)
  • Token ring algorithm
Leader election
  • Bully Algorithm — selecting the node with the highest ID
  • Ring Algorithm — ring traversal
  • Raft — election through terms and votes

Detailed analysis

Code of Architecture

Companion notes on flat, structured, and attribute-based names.

Read the analysis

Chapter 6: Naming

Flat names

Identifiers without structure

Broadcasting:ARP, DHCP
Forwarding:Pointer chains
DHT:Chord, Pastry

Structured names

Hierarchical namespaces

DNS:Domain names
File systems:File paths
X.500:Directory services

Attribute names

Search by attributes

LDAP:Directory queries
RDF:Semantic web
Service Discovery:Consul, etcd

Detailed analysis

Code of Architecture

Companion notes on consistency models, client guarantees, and replication protocols.

Read the analysis

Chapter 7: Consistency and Replication

Data-centric models

Strict Consistency
Absolute order of all operations
Sequential Consistency
Everyone sees the same order
Causal Consistency
Preservation of cause-and-effect relationships
Eventual Consistency
Consistency over time

Client-centric models

Read Your Writes
The client sees its own writes
Monotonic Reads
Reads move forward monotonically
Monotonic Writes
Write order is preserved
Writes Follow Reads
Write after reading is consistent

Consistency Protocols

Primary-based

Remote-write, Local-write protocols

Replicated-write

Active replication, Quorum-based

Cache coherence

Write-through, Write-back protocols

Video analysis

Code of Architecture

Video companion on failure models, recovery, and fault-tolerant process groups.

Watch video

Chapter 8: Fault Tolerance

Failure Models

Crash failure
The server suddenly stops
Omission failure
Lost requests or responses
Timing failure
Violation of timing bounds
Byzantine failure
Arbitrary (malicious) behavior

Consensus

Paxos

Classical Lamport consensus algorithm. Phases: Prepare → Promise → Accept → Accepted.

Two-Phase Commit (2PC)

Atomic commit protocol for distributed transactions. Coordinator → Prepare → Vote → Commit/Abort.

Process resilience

Replication

Active and Passive replication for high availability

Process groups

Virtual synchrony for consistent state

Recovery

Checkpointing and message logging

Detailed analysis

Code of Architecture

Companion notes on cryptographic foundations, secure channels, access control, and key management.

Read the analysis

Chapter 9: Security

Cryptographic basics

Symmetric encryption
AES, one key for encryption/decryption
Asymmetric encryption
RSA, public/private keys
Hash functions
SHA-256 integrity check

Protecting distributed systems

Secure channels
TLS/SSL, mutual authentication, perfect forward secrecy
Access control
ACL, capabilities, role-based access control (RBAC)
Key management
PKI, Certificate Authorities, key distribution

Results and recommendations

Strengths

  • Fundamental coverage of distributed-systems theory
  • Algorithmic approach with proofs
  • Examples in Python make the material more accessible
  • Deep coverage of consistency and coordination
  • Strong chapter on security

Who is it suitable for?

  • Students studying distributed systems
  • Engineers who want the theoretical foundations
  • Readers who want a deeper understanding of consensus algorithms
  • Staff+ Engineer interview preparation
  • Researchers and academic practitioners

Verdict: Tanenbaum and van Steen's book is a foundational textbook for understanding distributed systems through models, algorithms, and network limits. Unlike how-to guides, it explains why systems behave the way they do. It pairs well with DDIA: Tanenbaum gives the theoretical foundation, while DDIA shows the engineering application.

Related chapters

Where to find the book

Enable tracking in Settings