System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 11:59 PM

Event-Driven Architecture: Event Sourcing, CQRS, Saga

mid

Practical analysis of the event-driven approach: how to design event flows, when to use Event Sourcing and CQRS, and how to implement Saga for distributed transactions.

Reference

Martin Fowler: Event Sourcing

A classic explanation of event sourcing fundamentals and why teams adopt it.

Open reference

Event-Driven Architecture (EDA) shifts focus from synchronous calls to domain event streams. It improves flexibility and scalability, but requires strict discipline in event contracts, idempotency, and observability. In practice, EDA commonly revolves around three patterns: Event Sourcing, CQRS, and Saga.

Event-Driven fundamentals

Event is a fact

Events describe what has already happened in the domain and should be immutable.

Async by default

Producers and consumers are loosely coupled and communicate through broker/log.

Eventual consistency

A lag appears between write and read models, and must be accounted for in UX and APIs.

Animated data flows

Below is a flow visualization for three key scenarios: the base event pipeline, CQRS read/write split, and step coordination in Saga.

Data Flow Animations

Each animation starts independently with the `Start` button.

Event-Driven Pipeline

1

Command API

Receives business intent

2

Domain Aggregate

Validates and makes a decision

3

Event Store

Append-only event log

4

Broker/Log

Fan-out to consumers

5

Consumers

Projections, integrations, side effects

CQRS: write path and read path

Both lanes are shown in parallel and can be started independently.

write path (commands)

1

Client

POST/PUT command

2

Command Model

Invariants and business rules

3

Event Stream

Domain facts

4

Projector

Updates read model

read path (queries)

1

Client

GET query

2

Query API

Read-only endpoint

3

Read Model

Denormalized projection

Saga: coordination styles

The orchestrator centrally decides the next step and when to run compensations.

All key commands and decisions pass through one central coordinator.

Order Service

OrderCreated

Orchestrator
Orchestrator

ReserveFunds

Payment Service
Payment Service

PaymentReserved

Orchestrator
Orchestrator

ReserveStock

Inventory Service
Inventory Service

InventoryReserved

Orchestrator
Orchestrator

CreateShipment

Shipping Service
Shipping Service

ShipmentCreated

Orchestrator
Orchestrator

OrderCompleted

Order Service
Key idea: one central orchestrator makes decisions and coordinates service calls.

Related

Microservice Patterns [RU]

A dedicated chapter on integration patterns and distributed transactions.

Open chapter [RU]

Event Sourcing + CQRS + Saga

Event Sourcing

State is stored as a sequence of events, not as a final value snapshot.

When to use

  • You need full auditability and reproducible change history.
  • You need to rebuild projections from historical events.
  • Your domain is naturally expressed through events.

Trade-offs

  • Event schema migration and versioning are harder.
  • You need snapshot and replay strategy for performance.

CQRS

Split write model (commands) and read model (queries).

When to use

  • Read/write profiles differ significantly.
  • You need dedicated read-optimized projections.
  • The system is growing and needs independent path scaling.

Trade-offs

  • Operational complexity and component count increase.
  • Read model is often eventually consistent against write model.

Saga

Manage a distributed transaction through local steps plus compensations.

When to use

  • Operation spans multiple services or data stores.
  • 2PC is unavailable or impractical.
  • You need controlled rollback via compensating actions.

Trade-offs

  • You must design idempotency and redelivery behavior.
  • Long-running and partially completed flows are harder to debug.

Related Book

Software Architecture: The Hard Parts [RU]

Deep dive into trade-offs, distributed workflows, orchestration/choreography, and Saga practice.

Open chapter [RU]

Saga: coordination styles

Choreography

Services subscribe to events and react without a central coordinator.

Pros

  • Loose coupling
  • Fewer central bottlenecks

Risks

  • Harder end-to-end tracing
  • Risk of event spaghetti

Orchestration

An orchestrator explicitly drives steps and compensations.

Pros

  • Transparent workflow and observability
  • Easier process verification

Risks

  • Central complexity hotspot
  • Orchestration layer must scale

Decision matrix

NeedRecommendationWhy
Full audit trail and replayEvent SourcingHistory of changes is first-class data.
Separate read/write SLACQRSIndependent optimization and scaling of read/write paths.
Distributed transaction without 2PCSagaLocal transactions + compensating actions.
Simple CRUD system with low complexityDo not force EDAOperational complexity may exceed business benefit.

Common mistakes

  • Trying to adopt all patterns at once without explicit SLA and bounded context.
  • Treating events as facts while publishing technical noise without business meaning.
  • Skipping consumer idempotency under at-least-once delivery.
  • Ignoring schema versioning and backward compatibility of event contracts.
  • Not tracking DLQ, lag, duplicate rate, and saga completion time.

Related

Resilience Patterns

DLQ complements retry/backoff and limits cascading failures in consumer pipelines.

Open chapter

Dead Letter Queue (DLQ)

DLQ is a quarantine for messages that cannot be processed after retries. DLQ is not for hiding failures, but for preserving problematic events for safe triage and controlled replay.

When to send to DLQ

When a message exceeds retry limit, violates schema contract, or consistently fails due to data issues.

What to store

`messageId`, `attempts`, `lastError`, `originalTopic`, `failedAt`, contract version, and payload reference.

What to do next

Run triage, fix root cause, and execute re-drive batches with rate limiting and idempotency checks.

Practical DLQ checklist

  • Use dedicated DLQ per critical flow to avoid mixing domains and priorities.
  • Store error reason, retry count, source topic/queue, and payload reference for investigations.
  • Separate transient and non-transient failures: not all errors should hit DLQ after the same retry count.
  • Set up a re-drive process: manual or automatic message replay after root-cause fix.
  • Add alerts for DLQ growth rate and SLA for poison-message triage.

Mini implementation checklist

1. Define event contracts and versioning policy.
2. Ensure producer/consumer idempotency.
3. Configure DLQ, retry policy, and poison-message handling.
4. Monitor lag, replay time, and saga success rate.

Practical approach: start with events for one or two critical processes, add observability and retry rules, and only then scale EDA to the remaining bounded contexts.

Contract and operational discipline firstthen scale the pattern.

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov