Workflow orchestration matters once a business process outlives individual requests, services, and even platform restarts.
In real design work, the chapter shows how long-running processes, compensations, state ownership, and durable execution reshape the system more deeply than the choice between Temporal, Cadence, or Step Functions.
In interviews and engineering discussions, it helps compare orchestration and choreography through control visibility, evolution cost, and the risk of hanging or duplicated actions.
Practical value of this chapter
Design in practice
Design long-running processes with explicit compensation steps and state ownership.
Decision quality
Compare orchestration and choreography by control visibility and evolution complexity.
Interview articulation
Frame Saga answers through the main process path, failure paths, and recovery rules.
Failure framing
Set timeout and retry limits so workflows do not hang or duplicate side effects.
Primary source
Temporal Workflows
Core model for durable execution and Temporal Workflow semantics.
Workflow orchestration is an architectural layer for coordinating long-running business processes across microservices. It centralizes process state, retry and timeout policies, compensations, and operational control over execution.
When Orchestration Is Actually Needed
The process runs for minutes, hours, or days
When a business process outlives a single HTTP request, you need durable state and safe continuation after failures.
Compensations and rollback paths are part of the design
If steps touch multiple services, an orchestrator makes Saga execution explicit: compensations, rollback order, and a transparent action history.
Retry and timeout policies must be consistent
Shared rules for retries, backoff, and deadlines remove duplicated infrastructure logic from individual services.
Operational control matters
Replay, manual step restart, pause/resume, audit, and workflow-state metrics need to live in one operational plane.
Temporal, Cadence, and Step Functions: Practical Comparison
Temporal
- State model
- Durable execution and event history
- Authoring model
- Process logic in SDK code (Go/Java/TS/...)
- Retries and timeouts
- Retry policies for activities and workflows, plus timers
- Trade-offs
- Requires deterministic workflow discipline and a dedicated operating plane.
Cadence
- State model
- Durable execution, architecturally close to Temporal
- Authoring model
- Process logic in SDK code
- Retries and timeouts
- Activity retry policies and domain-level controls
- Trade-offs
- More common in existing installations and migration paths.
AWS Step Functions
- State model
- Managed state machine with ASL and visual states
- Authoring model
- Declarative state machines and AWS integrations
- Retries and timeouts
- State-level retry and error handling
- Trade-offs
- Strong AWS integration with higher vendor lock-in risk.
| Platform | State model | Authoring model | Retries and timeouts | Trade-offs |
|---|---|---|---|---|
| Temporal | Durable execution and event history | Process logic in SDK code (Go/Java/TS/...) | Retry policies for activities and workflows, plus timers | Requires deterministic workflow discipline and a dedicated operating plane. |
| Cadence | Durable execution, architecturally close to Temporal | Process logic in SDK code | Activity retry policies and domain-level controls | More common in existing installations and migration paths. |
| AWS Step Functions | Managed state machine with ASL and visual states | Declarative state machines and AWS integrations | State-level retry and error handling | Strong AWS integration with higher vendor lock-in risk. |
Reference Process With Compensations
A typical order process reserves inventory, charges payment, creates a shipment, and sends confirmation. If a step fails, compensations run in reverse order.
Reference Orchestration Process
Happy path and Saga compensations in a single visual flow.
export async function OrderWorkflow(input: OrderInput): Promise<void> {
const reservation = await reserveInventory(input.orderId, input.items);
try {
await chargePayment(input.orderId, input.amount);
await createShipment(input.orderId, reservation.warehouseId);
await sendConfirmation(input.orderId);
} catch (error) {
await refundPayment(input.orderId);
await releaseInventory(input.orderId);
throw error;
}
}Execution Contract and Reliability Checklist
Execution contract
- Every activity is idempotent: re-execution must not corrupt business state.
- Every external call and the overall process have explicit timeouts and deadlines.
- Compensations are business-valid reverse actions, not only technical rollbacks.
- Workflow logic is versioned so running instances can finish under older rules.
- Every workflow state is visible through metrics and tracing.
Reliability checklist
- Every workflow instance has a stable business key, such as `orderId`, and a deduplication policy.
- Activities avoid hidden nondeterministic calls unless wrapped in explicit side-effect primitives.
- Errors are split into retryable and non-retryable classes with different handling policies.
- Manual operations such as resume, terminate, and restart from a failed step are documented as runbooks.
- The orchestration SLO is measured separately: start latency, completion time, and failed-process rate.
Implementation Risks
Mixing business logic with transport details
Keep the process as a coordination layer; move domain decisions and external integration details into separate activity and handler layers.
Implicit compensations
Define compensations next to each step and test them separately with fault injection.
One giant workflow
Split the flow into subprocesses with clear inputs, outputs, and bounded-context ownership.
Insufficient observability
Publish metrics for step status, retry depth, queue growth, and time to completion.
References
Related chapters
- Interservice communication patterns - Core context for synchronous and asynchronous interaction between services.
- Distributed Transactions: 2PC and 3PC - Why Saga is often more practical than two-phase commit for long-running business processes.
- Event-Driven Architecture: Event Sourcing, CQRS, Saga - How to compare orchestration and choreography in event-driven systems.
- Service Discovery - How process steps find the right service endpoints at runtime.
- Fault Tolerance Patterns: Circuit Breaker, Bulkhead, Retry - Failure-management and graceful-degradation patterns for each workflow step.
