Event-driven architecture matters not because 'loose coupling' sounds good, but because it redistributes responsibility across time, across services, and across data models.
The chapter helps separate Event Sourcing, CQRS, and Saga into distinct engineering decisions, making it easier to see where asynchrony truly simplifies the system and where it introduces replay complexity, schema evolution, eventual-consistency lag, and more expensive observability.
In engineering discussions, this lets you reason calmly about choreography vs orchestration, auditability, business-process correctness, and the risk of stuck workflows instead of collapsing everything into 'let's add a broker and it will be more flexible.'
Practical value of this chapter
Event boundaries
Design domain events with stable schemas and versioning so evolution does not break downstream consumers.
Flow coordination
Choose orchestration vs choreography deliberately and define where central coordination is required.
Replay and recovery
Plan replay strategy, DLQ handling, and event-path observability so failures remain recoverable.
Interview storytelling
Explain when EDA truly reduces coupling and when it adds unjustified operational complexity.
Reference
Martin Fowler: Event Sourcing
A classic explanation of event sourcing fundamentals and why teams adopt it.
Event-Driven Architecture (EDA) shifts focus from synchronous calls to domain event streams. It improves flexibility and scalability, but requires strict discipline in event contracts, idempotency, and observability. In practice, EDA commonly revolves around three patterns: Event Sourcing, CQRS, and Saga.
Event-Driven fundamentals
Event is a fact
Events describe what has already happened in the domain and should be immutable.
Async by default
Producers and consumers are loosely coupled and communicate through broker/log.
Eventual consistency
A lag appears between write and read models, and must be accounted for in UX and APIs.
Animated data flows
Below is a flow visualization for three key scenarios: the base event pipeline, CQRS read/write split, and step coordination in Saga.
Data Flow Animations
Each animation starts independently with the `Start` button.
Event-Driven Pipeline
Command API
Receives business intent
Domain Aggregate
Validates and makes a decision
Event Store
Append-only event log
Broker/Log
Fan-out to consumers
Consumers
Projections, integrations, side effects
CQRS: write path and read path
Both lanes are shown in parallel and can be started independently.
write path (commands)
Client
POST/PUT command
Command Model
Invariants and business rules
Event Stream
Domain facts
Projector
Updates read model
read path (queries)
Client
GET query
Query API
Read-only endpoint
Read Model
Denormalized projection
Saga: coordination styles
The orchestrator centrally decides the next step and when to run compensations.
All key commands and decisions pass through one central coordinator.
OrderCreated
ReserveFunds
PaymentReserved
ReserveStock
InventoryReserved
CreateShipment
ShipmentCreated
OrderCompleted
Related
Microservice Patterns
A dedicated chapter on integration patterns and distributed transactions.
Event Sourcing + CQRS + Saga
Event Sourcing
State is stored as a sequence of events, not as a final value snapshot.
When to use
- You need full auditability and reproducible change history.
- You need to rebuild projections from historical events.
- Your domain is naturally expressed through events.
Trade-offs
- Event schema migration and versioning are harder.
- You need snapshot and replay strategy for performance.
CQRS
Split write model (commands) and read model (queries).
When to use
- Read/write profiles differ significantly.
- You need dedicated read-optimized projections.
- The system is growing and needs independent path scaling.
Trade-offs
- Operational complexity and component count increase.
- Read model is often eventually consistent against write model.
Saga
Manage a distributed transaction through local steps plus compensations.
When to use
- Operation spans multiple services or data stores.
- 2PC is unavailable or impractical.
- You need controlled rollback via compensating actions.
Trade-offs
- You must design idempotency and redelivery behavior.
- Long-running and partially completed flows are harder to debug.
Related Book
Software Architecture: The Hard Parts
Deep dive into trade-offs, distributed workflows, orchestration/choreography, and Saga practice.
Saga: coordination styles
Choreography
Services subscribe to events and react without a central coordinator.
Pros
- Loose coupling
- Fewer central bottlenecks
Risks
- Harder end-to-end tracing
- Risk of event spaghetti
Orchestration
An orchestrator explicitly drives steps and compensations.
Pros
- Transparent workflow and observability
- Easier process verification
Risks
- Central complexity hotspot
- Orchestration layer must scale
Decision matrix
| Need | Recommendation | Why |
|---|---|---|
| Full audit trail and replay | Event Sourcing | History of changes is first-class data. |
| Separate read/write SLA | CQRS | Independent optimization and scaling of read/write paths. |
| Distributed transaction without 2PC | Saga | Local transactions + compensating actions. |
| Simple CRUD system with low complexity | Do not force EDA | Operational complexity may exceed business benefit. |
Common mistakes
- Trying to adopt all patterns at once without explicit SLA and bounded context.
- Treating events as facts while publishing technical noise without business meaning.
- Skipping consumer idempotency under at-least-once delivery.
- Ignoring schema versioning and backward compatibility of event contracts.
- Not tracking DLQ, lag, duplicate rate, and saga completion time.
Related
Resilience Patterns
DLQ complements retry/backoff and limits cascading failures in consumer pipelines.
Dead Letter Queue (DLQ)
DLQ is a quarantine for messages that cannot be processed after retries. DLQ is not for hiding failures, but for preserving problematic events for safe triage and controlled replay.
When to send to DLQ
When a message exceeds retry limit, violates schema contract, or consistently fails due to data issues.
What to store
`messageId`, `attempts`, `lastError`, `originalTopic`, `failedAt`, contract version, and payload reference.
What to do next
Run triage, fix root cause, and execute re-drive batches with rate limiting and idempotency checks.
Practical DLQ checklist
- Use dedicated DLQ per critical flow to avoid mixing domains and priorities.
- Store error reason, retry count, source topic/queue, and payload reference for investigations.
- Separate transient and non-transient failures: not all errors should hit DLQ after the same retry count.
- Set up a re-drive process: manual or automatic message replay after root-cause fix.
- Add alerts for DLQ growth rate and SLA for poison-message triage.
Mini implementation checklist
Practical approach: start with events for one or two critical processes, add observability and retry rules, and only then scale EDA to the remaining bounded contexts.
Related chapters
- Data consistency patterns and idempotency - helps design duplicate-event handling, outbox/inbox flow, and correctness under at-least-once delivery.
- Fault tolerance patterns: Circuit Breaker, Bulkhead, Retry - extends EDA with retry/backoff policies, timeouts, and controlled degradation for consumer pipelines.
- Distributed Message Queue - shows practical queue architecture, partitioning, ordering guarantees, and throughput constraints in event flows.
- Inter-service communication patterns - compares sync and async integration models and clarifies where event-driven flow is better than direct RPC.
- Design principles for scalable systems - frames latency/throughput trade-offs and backpressure techniques that are critical for event-driven systems.
- Kafka: The Definitive Guide (short summary) - provides fundamentals for log-based event streaming and Kafka operational practices.
- Microservice Patterns (short summary) - deepens Saga, distributed transaction, and integration-pattern topics for microservice architectures.
