Reference
Martin Fowler: Event Sourcing
A classic explanation of event sourcing fundamentals and why teams adopt it.
Event-Driven Architecture (EDA) shifts focus from synchronous calls to domain event streams. It improves flexibility and scalability, but requires strict discipline in event contracts, idempotency, and observability. In practice, EDA commonly revolves around three patterns: Event Sourcing, CQRS, and Saga.
Event-Driven fundamentals
Event is a fact
Events describe what has already happened in the domain and should be immutable.
Async by default
Producers and consumers are loosely coupled and communicate through broker/log.
Eventual consistency
A lag appears between write and read models, and must be accounted for in UX and APIs.
Animated data flows
Below is a flow visualization for three key scenarios: the base event pipeline, CQRS read/write split, and step coordination in Saga.
Data Flow Animations
Each animation starts independently with the `Start` button.
Event-Driven Pipeline
Command API
Receives business intent
Domain Aggregate
Validates and makes a decision
Event Store
Append-only event log
Broker/Log
Fan-out to consumers
Consumers
Projections, integrations, side effects
CQRS: write path and read path
Both lanes are shown in parallel and can be started independently.
write path (commands)
Client
POST/PUT command
Command Model
Invariants and business rules
Event Stream
Domain facts
Projector
Updates read model
read path (queries)
Client
GET query
Query API
Read-only endpoint
Read Model
Denormalized projection
Saga: coordination styles
The orchestrator centrally decides the next step and when to run compensations.
All key commands and decisions pass through one central coordinator.
OrderCreated
ReserveFunds
PaymentReserved
ReserveStock
InventoryReserved
CreateShipment
ShipmentCreated
OrderCompleted
Related
Microservice Patterns [RU]
A dedicated chapter on integration patterns and distributed transactions.
Event Sourcing + CQRS + Saga
Event Sourcing
State is stored as a sequence of events, not as a final value snapshot.
When to use
- You need full auditability and reproducible change history.
- You need to rebuild projections from historical events.
- Your domain is naturally expressed through events.
Trade-offs
- Event schema migration and versioning are harder.
- You need snapshot and replay strategy for performance.
CQRS
Split write model (commands) and read model (queries).
When to use
- Read/write profiles differ significantly.
- You need dedicated read-optimized projections.
- The system is growing and needs independent path scaling.
Trade-offs
- Operational complexity and component count increase.
- Read model is often eventually consistent against write model.
Saga
Manage a distributed transaction through local steps plus compensations.
When to use
- Operation spans multiple services or data stores.
- 2PC is unavailable or impractical.
- You need controlled rollback via compensating actions.
Trade-offs
- You must design idempotency and redelivery behavior.
- Long-running and partially completed flows are harder to debug.
Related Book
Software Architecture: The Hard Parts [RU]
Deep dive into trade-offs, distributed workflows, orchestration/choreography, and Saga practice.
Saga: coordination styles
Choreography
Services subscribe to events and react without a central coordinator.
Pros
- Loose coupling
- Fewer central bottlenecks
Risks
- Harder end-to-end tracing
- Risk of event spaghetti
Orchestration
An orchestrator explicitly drives steps and compensations.
Pros
- Transparent workflow and observability
- Easier process verification
Risks
- Central complexity hotspot
- Orchestration layer must scale
Decision matrix
| Need | Recommendation | Why |
|---|---|---|
| Full audit trail and replay | Event Sourcing | History of changes is first-class data. |
| Separate read/write SLA | CQRS | Independent optimization and scaling of read/write paths. |
| Distributed transaction without 2PC | Saga | Local transactions + compensating actions. |
| Simple CRUD system with low complexity | Do not force EDA | Operational complexity may exceed business benefit. |
Common mistakes
- Trying to adopt all patterns at once without explicit SLA and bounded context.
- Treating events as facts while publishing technical noise without business meaning.
- Skipping consumer idempotency under at-least-once delivery.
- Ignoring schema versioning and backward compatibility of event contracts.
- Not tracking DLQ, lag, duplicate rate, and saga completion time.
Related
Resilience Patterns
DLQ complements retry/backoff and limits cascading failures in consumer pipelines.
Dead Letter Queue (DLQ)
DLQ is a quarantine for messages that cannot be processed after retries. DLQ is not for hiding failures, but for preserving problematic events for safe triage and controlled replay.
When to send to DLQ
When a message exceeds retry limit, violates schema contract, or consistently fails due to data issues.
What to store
`messageId`, `attempts`, `lastError`, `originalTopic`, `failedAt`, contract version, and payload reference.
What to do next
Run triage, fix root cause, and execute re-drive batches with rate limiting and idempotency checks.
Practical DLQ checklist
- Use dedicated DLQ per critical flow to avoid mixing domains and priorities.
- Store error reason, retry count, source topic/queue, and payload reference for investigations.
- Separate transient and non-transient failures: not all errors should hit DLQ after the same retry count.
- Set up a re-drive process: manual or automatic message replay after root-cause fix.
- Add alerts for DLQ growth rate and SLA for poison-message triage.
Mini implementation checklist
Practical approach: start with events for one or two critical processes, add observability and retry rules, and only then scale EDA to the remaining bounded contexts.
