System Design Space
Knowledge graphSettings

Updated: April 26, 2026 at 10:27 AM

Distributed Transactions: 2PC and 3PC

hard

A practical look at distributed transactions: coordinator and participants, two-phase and three-phase commit, blocking, partial failures, and alternatives via Saga and transactional outbox.

Distributed transactions become painful exactly where the business wants atomicity but the architecture has already split across multiple services and stores.

In real engineering work, this chapter helps choose between 2PC, 3PC, Saga, and transactional outbox by domain boundaries, acceptable failure behavior, blocking risk, and coordination cost.

In interviews and design conversations, it is especially useful when you need to speak plainly about timeout semantics, partial commit, compensations, and idempotency requirements.

Practical value of this chapter

Design in practice

Helps choose transaction patterns by domain boundaries and acceptable failure behavior.

Decision quality

Compares 2PC, 3PC, and Saga by latency, locking impact, and operational complexity.

Interview articulation

Provides a clear narrative for coordinator, participants, commit point, and recovery.

Risk and trade-offs

Makes blocking, partial-commit, timeout, and idempotency trade-offs explicit.

Context

Consistency and idempotency

Distributed transactions are one way to ensure consistency, but not the only one.

Open chapter

Distributed Transactions (2PC/3PC) are needed when a business invariant requires a coordinated change across several independent resources. The cost is extra latency, blocking, and complex recovery after partial failures.

When is a distributed transaction needed?

  • One business operation must atomically change several independent resources or services.
  • Even temporary divergence is unacceptable for this class of operation.
  • A partially committed operation would create serious financial, regulatory, or user-facing risk.

How two-phase commit (2PC) works

2PC: two-phase commit

Prepare -> votes -> global decision (commit/abort)

The coordinator collects participant votes and makes one global commit/abort decision for the whole transaction.

Strengths

  • Simple coordination model that is easy to reason about.
  • Clearly separates preparation from the final decision.

Risks

  • Blocking is possible if the coordinator fails at the wrong time.
  • Highly sensitive to timeout/retry tuning.

Protocol Steps

Current Command

Click Start to play the protocol step-by-step.

Coordinator

Waiting to start

Coordinator commands: 0

Participants

3 participants

Active step: 0 / 8

Order

participant A

Waiting for commands

Involved in steps: 0

Payment

participant B

Waiting for commands

Involved in steps: 0

Inventory

participant C

Waiting for commands

Involved in steps: 0

How three-phase commit (3PC) works

3PC: three-phase commit

CanCommit -> PreCommit -> DoCommit

Adds an intermediate PreCommit phase to reduce blocking risk when the coordinator has issues.

Strengths

  • Reduces the chance of getting stuck in an uncertain state.
  • Explicitly separates intent from final commit.

Risks

  • More network rounds and a more complex state machine.
  • Requires very careful timeout and recovery tuning.

Protocol Steps

Current Command

Click Start to play the protocol step-by-step.

Coordinator

Waiting to start

Coordinator commands: 0

Participants

3 participants

Active step: 0 / 12

Order

participant A

Waiting for commands

Involved in steps: 0

Payment

participant B

Waiting for commands

Involved in steps: 0

Inventory

participant C

Waiting for commands

Involved in steps: 0

Alternative

Event-Driven Architecture

In many scenarios, Saga plus transactional outbox is a better fit than global 2PC/3PC.

Open chapter

Trade-offs and alternatives

2PC is easy to explain and adopt, but participants can block if the coordinator fails after prepare.

3PC reduces the chance of getting stuck, but adds a network round and a more complex protocol state machine.

Both approaches are sensitive to network partitions, timeouts, and correct recovery after partial failures.

In a microservice architecture, a shared atomic transaction across services is often too expensive and fragile.

Saga (orchestration/choreography)

Breaks the operation into local steps with compensating actions instead of a global synchronized commit.

Transactional outbox

Writes the outgoing event next to the business change in the local database, then publishes it safely later.

Idempotent commands + reconciliation

Safe retries and background reconciliation reduce the impact of partial failures.

Domain redesign

Sometimes it is cheaper to move aggregate boundaries and remove the cross-service atomicity requirement.

Practical checklist

  • It is explicitly defined where strict atomicity is needed and where eventual consistency is acceptable.
  • There is a coordinator recovery strategy and durable transaction log.
  • Timeout policies are tested under partition and delay scenarios.
  • All participants handle repeated COMMIT and ABORT commands safely.
  • There is a business mechanism for compensation and manual resolution of disputed cases.

Frequent anti-pattern: introducing 2PC between services without evaluating blocking, retry behavior, and recovery cost.

References

Related chapters

Enable tracking in Settings