Event-driven architecture matters not because loose coupling sounds attractive, but because it lets teams separate the moment of decision, the publication of a fact, and downstream reactions across time.
The chapter breaks Event Sourcing, CQRS, and Saga into distinct engineering choices, so it becomes clear where asynchrony truly simplifies the system and where it brings schema evolution, replay complexity, convergence lag, and higher operational cost.
In engineering discussions, this helps you reason calmly about orchestration versus choreography, event boundaries, stuck workflows, and the real price of asynchrony instead of collapsing everything into 'let's add a broker and it will be more flexible.'
Practical value of this chapter
Event Contracts
Design events as stable business facts with compatible schema evolution instead of accidental transport packets.
Flow Coordination
Choose orchestration versus choreography deliberately and define where an explicit control loop is required.
Replay and DLQ
Plan replay, DLQ handling, and handler idempotency so failures stay recoverable instead of becoming data loss.
Decision Rationale
Explain when event-driven flow truly lowers coupling and when it only adds lag and operational cost.
Reference
Martin Fowler: Event Sourcing
A classic text on why teams store history as events and where that trade really pays off.
Event-Driven Architecture (EDA) matters not because a broker magically makes systems flexible, but because it separates the moment of decision from the moment of reaction. That freedom improves scalability and decoupling, yet it only pays off when event contracts, replay strategy, and failure recovery are designed deliberately.
Core principles of event-driven architecture
This chapter separates Event Sourcing, CQRS, and Saga into three distinct tools rather than treating them as one mandatory package. They solve different problems: durable history, independent read and write scaling, and coordination of long-running business flows.
That means agreeing up front on event contracts, schema evolution, snapshots, auditability, orchestration versus choreography, and what the system should do under at-least-once delivery, consumer lag, and dead-letter-queue recovery. Those decisions belong inside each bounded context instead of being copied mechanically across the whole platform.
Events record facts
An event should describe something that already happened in the domain and should not be rewritten later.
Async flow reduces coupling
Producers and consumers can evolve independently as long as the event contract stays stable.
Convergence is not immediate
There is often a lag between the write side and the final user-facing view, and product behavior must account for it.
Data flow visualizations
Below are three characteristic scenarios: a base event pipeline, CQRS read and write split, and step coordination inside Saga.
Data flow animations
Each diagram can be started independently with the Start button.
Base event pipeline
Command API
Receives the business command
Domain Aggregate
Validates invariants and makes a decision
Event Store
Appends events in order
Broker or log
Fans events out to consumers
Consumers
Build projections, integrations, and side effects
CQRS: write path and read path
Both lanes are shown in parallel and can be started independently.
Write path
Client
Sends a POST or PUT command
Command model
Applies business rules and invariants
Event stream
Captures domain facts
Projector
Updates the read model
Read path
Client
Sends a GET query
Query API
Serves read-only responses
Read model
Stores a denormalized projection
Saga: coordination styles
A dedicated coordinator decides which step should run next and when compensations must begin.
All key commands and decisions pass through a single coordinator.
OrderCreated
ReserveFunds
PaymentReserved
ReserveStock
InventoryReserved
CreateShipment
ShipmentCreated
OrderCompleted
Related chapter
Microservice Patterns
A dedicated chapter on integration patterns, Saga, and the limits of distributed transactions.
Event Sourcing, CQRS, and Saga
Event Sourcing
System state is rebuilt from a sequence of events rather than stored only as the latest snapshot.
When to use
- You need a full audit trail and reproducible change history.
- You need to rebuild projections or derive new read models from historical facts.
- Your domain is naturally expressed as events that happened over time.
Trade-offs
- Schema evolution and old-history migrations become harder.
- You need snapshots and replay strategy to keep recovery and startup fast.
CQRS
Write path and read path are split so each side can be shaped around its own workload and data model.
When to use
- Read load and write load differ significantly.
- You need read-optimized projections for fast queries.
- The system has grown enough that reads and writes should scale independently.
Trade-offs
- Operational complexity and component count increase.
- The read model usually converges with the write side with some lag rather than instantly.
Saga
A distributed business operation is broken into local steps and compensations instead of one shared transaction.
When to use
- One business flow touches multiple services or data stores.
- 2PC is unavailable or too expensive in latency and coupling.
- You need controlled recovery from partially completed work through compensating actions.
Trade-offs
- Every step and every compensation must survive retries without a double effect.
- Long-running and partially completed flows are harder to debug and observe.
Related book
Software Architecture: The Hard Parts
A strong deep dive into engineering trade-offs, distributed workflows, and real Saga practice.
Saga: coordination styles
Orchestration makes the workflow more explicit and centralizes decisions, while choreography lowers coupling further but makes tracing and debugging more demanding.
Choreography
Services subscribe to events and trigger the next step themselves without a central coordinator.
Pros
- Lower coupling between services
- No single control point that automatically becomes a bottleneck
Risks
- Harder to trace the full business flow end to end
- Easy to drift into event spaghetti without explicit boundaries
Orchestration
A dedicated coordinator explicitly decides which step runs next and when compensations should start.
Pros
- The whole workflow is easier to observe
- Business rules and transitions are easier to verify
Risks
- A central hotspot of complexity appears
- The orchestration layer must be scaled and protected separately
Choosing the right pattern
| Need | Pick | Why |
|---|---|---|
| You need full history and state reconstruction | Event Sourcing | Change history becomes primary data instead of a side log. |
| Reads and writes live under different SLAs and traffic profiles | CQRS | You can optimize and scale write and read paths independently. |
| One business operation spans multiple services without a shared transaction | Saga | Local steps plus compensations give you controlled completion and recovery. |
| The system is still a simple CRUD application with low integration complexity | Do not force EDA | Operational complexity may easily exceed the business value. |
Common mistakes
- Adopting Event Sourcing, CQRS, and Saga together before naming the concrete problem each one solves.
- Publishing technical noise as events instead of domain facts.
- Skipping consumer idempotency under at-least-once delivery.
- Leaving schema evolution without a backward-compatibility policy.
- Ignoring DLQ growth, processing lag, duplicate rate, and Saga completion time.
Related chapter
Resilience Patterns
DLQ complements retries, timeouts, and failure isolation in asynchronous processing loops.
Dead Letter Queue (DLQ)
DLQ exists not to hide failures, but to isolate problematic messages safely, preserve them for investigation, and replay them in a controlled way once the root cause is fixed. It becomes essential when handlers touch unstable dependencies, messy data, or expensive side effects.
When to send to DLQ
When a message exhausts retries, violates the event contract, or keeps failing because of the data itself.
What to store
Message identity, retry count, last error, source, failure time, and a reference to the payload.
What to do next
Triage the root cause, fix it, and replay messages back into the main flow with rate control and idempotency checks.
Practical DLQ checklist
- Use a dedicated DLQ for each critical flow so domains and priorities do not get mixed together.
- Store the failure reason, retry count, original topic or queue, and a payload reference for investigation.
- Separate transient failures from persistent ones: not every message should land in DLQ after the same number of retries.
- Define how messages return from DLQ into the main flow after the root cause is fixed, either manually or automatically.
- Track DLQ growth rate and agree on a target time for triaging poison messages.
Checks before rollout
A practical rollout path: start with one or two critical flows, define the contract, observability, and retry rules first, and only then expand event-driven architecture into the rest of the system.
Related chapters
- Consistency and idempotency - helps design duplicate-event handling, outbox and inbox flow, and correctness under at-least-once delivery.
- Fault tolerance patterns: Circuit Breaker, Bulkhead, Retry - extends event-driven systems with retry rules, timeouts, and controlled degradation for async handlers.
- Distributed Message Queue - shows practical queue architecture, partitioning, ordering guarantees, and throughput limits in event flows.
- Inter-service communication patterns - compares synchronous and asynchronous integration and clarifies where event-driven flow beats direct RPC.
- Design principles for scalable systems - frames latency-versus-throughput trade-offs and backpressure techniques that matter in event-driven systems.
- Kafka: The Definitive Guide, 2nd Edition (short summary) - provides the foundation for log-based event streaming, partitioning, delivery semantics, and Kafka operations.
- Microservice Patterns (short summary) - goes deeper into Saga, distributed transactions, and integration patterns for microservice architectures.
