PostgreSQL from the inside (short summary)

PostgreSQL looks straightforward until you need to reason about MVCC, WAL, locks, and planner behavior. This book is valuable because it walks into that inner layer without turning it into mysticism.

In real work, it connects xmin/xmax, snapshots, buffer cache, index types, and crash recovery to concurrent correctness and operating cost.

In interviews and architecture discussions, this material is strongest when you need to show that you can do more than write SQL and can troubleshoot PostgreSQL at the system level.

Practical value of this chapter

MVCC visibility model

Understanding xmin/xmax and snapshot behavior helps design safe concurrent business operations.

WAL and checkpoint impact

Model write amplification, recovery time, and durability requirements when choosing topology and settings.

Planner and index strategy

Connect statistics, planner behavior, and index types to critical query paths before scale hits.

Interview root-cause reasoning

Demonstrate PostgreSQL troubleshooting at the system-mechanics level, not only at SQL syntax level.

Official source

Postgres Pro

Free electronic edition of Egor Rogov's book on PostgreSQL internals.

Download for free

PostgreSQL 17 from the Inside

Authors: Egor Rogov
Publisher: DMK Press, 2025
Length: 668 pages

Analysis of Egor Rogov's book on PostgreSQL internals: MVCC, WAL, buffer cache, locks, query planning, indexes, and crash recovery.

Original

Why read this book?

Understanding the internals of PostgreSQL is critical to:

Query performance - understanding the planner helps you write SQL that matches the data shape
Root-cause analysis - MVCC and lock behavior explain many concurrency incidents
Architecture discussions - real DBMS mechanics make storage, indexing, and recovery trade-offs concrete
Index and partitioning choices - internals explain why the same SQL behaves differently as data grows

Related chapter

Database Internals

Book by Alex Petrov about B-Trees, LSM-Trees and general principles of DBMS design.

Читать обзор

Book structure: 5 parts

Part 1: Isolation and multi-version concurrency

PostgreSQL is built around MVCC, a mechanism that lets ordinary reads and writes avoid waiting on each other in many common cases.

Key Concepts

Transaction isolation levels
Structure of pages and row versions (tuples)
Consistent snapshots
Version visibility rules

Cleanup and maintenance

HOT updates that avoid touching indexes
Vacuum and Autovacuum processes
Freezing transaction IDs
Rebuilding tables and indexes

MVCC: Multi-Versioning in PostgreSQL

1 / 4

Initial state

The row was created by transaction 100. xmin=100 means it is visible to all transactions with ID >= 100.

Heap (table storage)

id: 1Alice

xmin:100

xmax:∅

xmin— transaction ID that created the version

xmax— transaction ID that deleted/updated the version

Transactions (select to check visibility)

Part 2: Buffer cache and log

How PostgreSQL manages memory and makes committed changes durable through Write-Ahead Logging (WAL).

Shared buffer cache

Shared buffers and their organization
Clock sweep algorithm
Buffer pins and reference counters
Local buffers for temporary tables

WAL (Write-Ahead Log)

Structure of WAL records
Checkpoints and recovery
Synchronization modes (fsync)
Archiving and replication

Part 3: Locks

MVCC reduces many conflicts, but PostgreSQL still uses several lock families to protect data structures and consistency.

Object locks

8 table locking modes
Compatibility matrix
Advisory locks

Row locks

FOR UPDATE / FOR SHARE
Row-version locks
Multixact for shared locks

Locks in memory

Spinlocks
Lightweight locks (LWLocks)
Buffer pins

Part 4: Query execution

From SQL parsing to a returned result, the book walks through the main stages of PostgreSQL query processing.

Processing stages

Parsing → Rewriting → Planning
Collection and use of statistics
Planner cost model
EXPLAIN ANALYZE in detail

Execution Operations

Seq Scan vs Index Scan vs Bitmap Scan
Nested Loop / Hash / Merge Join
In-memory and disk-based sorting
Aggregation and grouping

Part 5: Types of Indexes

PostgreSQL supports many types of indexes for different use cases.

Classic

B-tree - universal, default
Hash - only equality

Specialized

GiST — geometry, full text
SP-GiST - sparse data
GIN - arrays, JSONB, FTS

For analytics

BRIN — Block Range Index
Ideal for time-series data
Minimum index size

Key takeaways for system design

🔄MVCC and isolation

Each transaction sees its own “snapshot” of data
Old versions of rows are stored in the same table
VACUUM removes “dead” versions
Trade-off: high concurrency at the cost of table and index bloat

📝WAL and durability

Changes are first written to the log, then to the data
After a crash, PostgreSQL recovers from WAL
Streaming replication is built on WAL flow
Checkpoints balance recovery time and I/O pressure

🎯Index selection

B-tree: ranges, sorting, uniqueness
GIN: full text search, JSONB, arrays
GiST: geodata, R-tree for coordinates
BRIN: huge append-only tables (logs, metrics)

⚡Query optimization

The planner chooses a path based on statistics
ANALYZE updates table statistics
Index-only scans avoid heap access when visibility data allows it
Parallel queries for large tables

Related chapter

Guide to Databases

Tutorial from PostgreSQL - fundamentals of the relational model, SQL and DBMS architecture.

Читать обзор

PostgreSQL vs other DBMS

Aspect	PostgreSQL	MySQL (InnoDB)
MVCC	Heap versions, requires VACUUM	Undo log, automatic cleaning
Indexes	B-tree, Hash, GiST, GIN, BRIN, SP-GiST	B-tree, Full-text, Spatial
Replication	Streaming (physical), Logical	Async, Semi-sync, Group Replication
Extensibility	Extensions, custom types, operators	Plugins, storage engines

About the author

Egor Rogov is a PostgreSQL expert at Postgres Professional, an author of PostgreSQL administration and development courses, and an active member of the PostgreSQL community. The book distills years of teaching, consulting, and explaining difficult mechanisms in approachable language.

Related chapters

PostgreSQL: history and architecture - High-level PostgreSQL overview: evolution, architecture principles, and its role in the modern OLTP landscape.
Database Internals: A Deep Dive (short summary) - Comparison of PostgreSQL internals with broader storage-engine, transaction, and recovery design principles.
Replication and sharding - Operational continuation for HA and scale: replication topologies, read/write paths, and failover behavior.
Database Selection Framework - How MVCC, WAL, and index internals can inform concrete DBMS selection decisions in production systems.
Designing Data-Intensive Applications, 2nd Edition (short summary) - Theoretical context for transactions, consistency, and replication to better interpret PostgreSQL internals.
Guide to Databases (short summary) - Foundational SQL and DBMS architecture track that complements this deeper PostgreSQL internals chapter.

Verdict

"PostgreSQL from the inside" is essential reading for engineers who work seriously with PostgreSQL. The book is free, written in Russian, and explains mechanisms that often remain a black box. It is especially useful when you want to discuss storage design through real trade-offs rather than surface-level SQL knowledge.

Where to find the book

Original

dmkpress.com

PostgreSQL 17 изнутри