System Design Space
Knowledge graphSettings

Updated: March 2, 2026 at 3:45 PM

Distributed Transactions: 2PC and 3PC

hard

Practical analysis of distributed transactions: coordinator, prepare/commit phases, failure modes, blocking trade-offs and alternatives via Saga/outbox.

Context

Consistency and idempotency

Distributed transactions are one way to ensure consistency, but not the only one.

Open chapter

Distributed Transactions (2PC/3PC) are needed when a business invariant requires a coordinated change in several independent resources. The price of this choice is delays, blocking and complex recovery logic for partial failures.

When is a distributed transaction needed?

  • One business operation affects several independent resources/services.
  • Temporal inconsistency cannot be accepted for a particular class of operations.
  • A partial commit error results in significant financial/regulatory risks.

2PC flow

2PC: two-phase commit

Prepare -> votes -> global decision (commit/abort)

The coordinator collects participant votes and makes one global commit/abort decision for the whole transaction.

Strengths

  • Simple and easy-to-understand coordination model.
  • Clearly separates preparation from the final decision.

Risks

  • Blocking is possible if the coordinator fails at the wrong time.
  • Highly sensitive to timeout/retry tuning.

Protocol Steps

Current Command

Click Start to play the protocol step-by-step.

Coordinator

Waiting to start

Coordinator commands: 0

Participants

3 participants

Active step: 0 / 8

Order

participant A

Waiting for commands

Involved in steps: 0

Payment

participant B

Waiting for commands

Involved in steps: 0

Inventory

participant C

Waiting for commands

Involved in steps: 0

3PC flow

3PC: three-phase commit

CanCommit -> PreCommit -> DoCommit

Adds an intermediate pre-commit phase to reduce the risk of blocking when coordinator issues occur.

Strengths

  • Reduces probability of hanging in an uncertain state.
  • Explicitly separates intent from final commit.

Risks

  • More network rounds and a more complex state machine.
  • Requires very careful timeout and recovery tuning.

Protocol Steps

Current Command

Click Start to play the protocol step-by-step.

Coordinator

Waiting to start

Coordinator commands: 0

Participants

3 participants

Active step: 0 / 12

Order

participant A

Waiting for commands

Involved in steps: 0

Payment

participant B

Waiting for commands

Involved in steps: 0

Inventory

participant C

Waiting for commands

Involved in steps: 0

Alternative

Event-Driven Architecture

In many scenarios, Saga + outbox gives better balance than global 2PC/3PC.

Open chapter

Trade-offs and alternatives

2PC is simple in concept, but can lock down the system if the coordinator fails.

3PC reduces the likelihood of blocking, but adds network rounds and state machine complexity.

Both approaches are sensitive to network partition, timeout tuning and correct recovery logic.

In a microservice architecture, a full ACID transaction between services is often too expensive and fragile.

Saga (orchestration/choreography)

Breaks the transaction into local steps with compensating actions instead of a global lockstep commit.

Transactional outbox

Guarantees consistency of the local database and event publishing without a distributed XA transaction.

Idempotent commands + reconciliation

Repeatable operations and background leveling reduce the effects of partial failure.

Domain redesign

Sometimes it is cheaper to change the boundaries of aggregates and remove the cross-service atomic requirement.

Practical checklist

  • It is explicitly defined where strict atomicity is needed and where eventual consistency is acceptable.
  • There is a coordinator recovery strategy and durable transaction log.
  • Timeout policies tested for partition/delay scenarios.
  • All participants support idempotent commit/abort processing.
  • There is a business mechanism for compensation and manual resolution of controversial cases.

Frequent anti-pattern: introducing 2PC between services without evaluating blocking, retry model and recovery cost.

References

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov