Distributed correctness breaks not only during obvious outages, but also during retries, delayed data visibility, and partially completed operations.
This chapter ties together consistency models, read-your-writes behavior, idempotency keys, consumer-side duplicate suppression, reliable event publication, and Saga compensations into one design frame where the key question is which invariants may be relaxed and which must survive every repeat.
That framing is especially useful in system design interviews because it lets you explain correctness through concrete safeguards instead of hiding behind the slogan of exactly-once processing, which says very little about real failure behavior.
Practical value of this chapter
Consistency Model
State early where strict guarantees are required, where read-your-writes is enough, and where temporary divergence is acceptable.
Safe Retries
Design commands and consumers so that a repeated request or event does not become a second business operation.
Partial Failures
Treat lost responses, duplicate delivery, out-of-order events, and partial completion as distinct correctness scenarios.
Decision Rationale
Explain which invariants remain protected, what consistency costs you accept, and how the system converges after failure.
Theory
CAP Theorem
Every consistency choice is tied to availability, latency, and the cost of coordination.
Data consistency and idempotency are not for an ideal world without failures. They matter in systems where responses get lost, events arrive again, and a business operation completes in fragments. The real question of this chapter is which consistency level the product truly needs and how repeated commands can be prevented from becoming new business operations.
Related chapter
Jepsen and consistency models
A practical look at how distributed databases actually behave under faults.
Consistency Models
Consistency is about when different copies of data are expected to see the same state. In practice it sits next to idempotency, read-your-writes behavior, session consistency, eventual consistency, quorum design, idempotency keys, request fingerprints, transactional outbox flow, partial failures, out-of-order delivery, and redelivery.
Strong consistency
Payments, balance invariants, and workflows where the cost of error is high.
You pay with higher latency, more expensive coordination, and stricter behavior during network partitions.
Read-your-writes and session consistency
Account settings, profile updates, and flows where users must immediately see their own changes.
You need the right read routing, careful cache behavior, and an explicit freshness policy.
Eventual consistency
Catalogs, recommendations, analytical views, and asynchronous cross-service integrations.
Temporary divergence becomes part of the product behavior, so lag handling must be explicit.
Idempotency Patterns
Active pattern
Idempotency key for synchronous APIs
POST commands such as payments, order creation, invoice issuance, or workflow start.
How to implement
- The client sends `Idempotency-Key`, and the server stores the key, request fingerprint, and final response.
- If the same key arrives again with the same request body, the server returns the original result instead of creating a new operation.
- Choose key lifetime by business risk: financial operations often keep keys for 24-72 hours, while short-lived commands need a smaller window.
Risk: If the same key is accepted for a different request body, the conflict gets hidden behind what looks like a safe retry.
What idempotency does not solve by itself
- Idempotency protects against redelivery, but it does not replace concurrency control or business invariants.
- For critical commands, it helps to store not only the processed flag, but also the final response or rejection code so that retries stay deterministic.
- Monitor retry share, rejected-duplicate rate, and the time spent resolving idempotency-key conflicts.
Validation
Testing Distributed Systems
Idempotency should be validated with retries, out-of-order events, and partial-failure scenarios.
Practical Scenarios
Payment API
A payment retried after a timeout can turn into a double charge unless the API treats the repeat as the same intent.
Failure path
Client
timeout + retry
Payment API
no idempotency key
Database
duplicate charge
User
double debit
Resilient path
Client
idempotency key
API
duplicate + uniqueness check
Ledger
single transaction
API
return prior result
What happens
- The idempotency key maps every repeat of the same client intent to one business operation.
- Even when the request arrives again, the server returns the original outcome instead of creating a second transaction.
- A uniqueness guard plus a status endpoint close the race window between multiple retries.
Risk: Key lifetime and scope must match the real business time window of the operation.
The scenario should remain correct under retries, redelivery, and out-of-order events.
Practical Checklist
Every critical command has an explicit duplicate-request policy.
Events have stable identity and a clear consumer-side duplicate-suppression strategy.
The chosen consistency model is reflected in the API contract and in product expectations.
Background reconciliation exists to detect and repair divergence.
The team tests retries, redelivery, and out-of-order delivery in integration scenarios.
Common mistake: assuming the platform will deliver exactly once and skipping idempotent design altogether.
References
Related chapters
- CAP Theorem - Why consistency and availability cannot both be maximized once the network starts to split.
- PACELC Theorem - How the latency-versus-consistency trade-off appears even when there is no incident.
- Event-Driven Architecture - Where retries, partial failures, and state convergence become everyday engineering work.
- Resilience Patterns - Why retries are only safe when the contract is already designed to be idempotent.
- Testing Distributed Systems - How to validate retries, out-of-order events, and partially completed operations before production.
