Workflow Orchestration: Temporal, Cadence, Step Functions

Workflow orchestration matters once a business process outlives individual requests, services, and even platform restarts.

In real design work, the chapter shows how long-running processes, compensations, state ownership, and durable execution reshape the system more deeply than the choice between Temporal, Cadence, or Step Functions.

In interviews and engineering discussions, it helps compare orchestration and choreography through control visibility, evolution cost, and the risk of hanging or duplicated actions.

Practical value of this chapter

Design in practice

Design long-running processes with explicit compensation steps and state ownership.

Decision quality

Compare orchestration and choreography by control visibility and evolution complexity.

Interview articulation

Frame Saga answers through the main process path, failure paths, and recovery rules.

Failure framing

Set timeout and retry limits so workflows do not hang or duplicate side effects.

Primary source

Temporal Workflows

Core model for durable execution and Temporal Workflow semantics.

Open documentation

Workflow orchestration is an architectural layer for coordinating long-running business processes across microservices. It centralizes process state, retry and timeout policies, compensations, and operational control over execution.

When Orchestration Is Actually Needed

The process runs for minutes, hours, or days

When a business process outlives a single HTTP request, you need durable state and safe continuation after failures.

Compensations and rollback paths are part of the design

If steps touch multiple services, an orchestrator makes Saga execution explicit: compensations, rollback order, and a transparent action history.

Retry and timeout policies must be consistent

Shared rules for retries, backoff, and deadlines remove duplicated infrastructure logic from individual services.

Operational control matters

Replay, manual step restart, pause/resume, audit, and workflow-state metrics need to live in one operational plane.

Temporal, Cadence, and Step Functions: Practical Comparison

Temporal

State model: Durable execution and event history
Authoring model: Process logic in SDK code (Go/Java/TS/...)
Retries and timeouts: Retry policies for activities and workflows, plus timers
Trade-offs: Requires deterministic workflow discipline and a dedicated operating plane.

Cadence

State model: Durable execution, architecturally close to Temporal
Authoring model: Process logic in SDK code
Retries and timeouts: Activity retry policies and domain-level controls
Trade-offs: More common in existing installations and migration paths.

AWS Step Functions

State model: Managed state machine with ASL and visual states
Authoring model: Declarative state machines and AWS integrations
Retries and timeouts: State-level retry and error handling
Trade-offs: Strong AWS integration with higher vendor lock-in risk.

Platform	State model	Authoring model	Retries and timeouts	Trade-offs
Temporal	Durable execution and event history	Process logic in SDK code (Go/Java/TS/...)	Retry policies for activities and workflows, plus timers	Requires deterministic workflow discipline and a dedicated operating plane.
Cadence	Durable execution, architecturally close to Temporal	Process logic in SDK code	Activity retry policies and domain-level controls	More common in existing installations and migration paths.
AWS Step Functions	Managed state machine with ASL and visual states	Declarative state machines and AWS integrations	State-level retry and error handling	Strong AWS integration with higher vendor lock-in risk.

Reference Process With Compensations

A typical order process reserves inventory, charges payment, creates a shipment, and sends confirmation. If a step fails, compensations run in reverse order.

Reference Orchestration Process

Happy path and Saga compensations in a single visual flow.

Successful pathCompensationsFailure point

export async function OrderWorkflow(input: OrderInput): Promise<void> {
  const reservation = await reserveInventory(input.orderId, input.items);

  try {
    await chargePayment(input.orderId, input.amount);
    await createShipment(input.orderId, reservation.warehouseId);
    await sendConfirmation(input.orderId);
  } catch (error) {
    await refundPayment(input.orderId);
    await releaseInventory(input.orderId);
    throw error;
  }
}

Execution Contract and Reliability Checklist

Execution contract

Every activity is idempotent: re-execution must not corrupt business state.
Every external call and the overall process have explicit timeouts and deadlines.
Compensations are business-valid reverse actions, not only technical rollbacks.
Workflow logic is versioned so running instances can finish under older rules.
Every workflow state is visible through metrics and tracing.

Reliability checklist

Every workflow instance has a stable business key, such as `orderId`, and a deduplication policy.
Activities avoid hidden nondeterministic calls unless wrapped in explicit side-effect primitives.
Errors are split into retryable and non-retryable classes with different handling policies.
Manual operations such as resume, terminate, and restart from a failed step are documented as runbooks.
The orchestration SLO is measured separately: start latency, completion time, and failed-process rate.

Implementation Risks

Mixing business logic with transport details

Keep the process as a coordination layer; move domain decisions and external integration details into separate activity and handler layers.

Implicit compensations

Define compensations next to each step and test them separately with fault injection.

One giant workflow

Split the flow into subprocesses with clear inputs, outputs, and bounded-context ownership.

Insufficient observability

Publish metrics for step status, retry depth, queue growth, and time to completion.

References

Related chapters

Interservice communication patterns - Core context for synchronous and asynchronous interaction between services.
Distributed Transactions: 2PC and 3PC - Why Saga is often more practical than two-phase commit for long-running business processes.
Event-Driven Architecture: Event Sourcing, CQRS, Saga - How to compare orchestration and choreography in event-driven systems.
Service Discovery - How process steps find the right service endpoints at runtime.
Fault Tolerance Patterns: Circuit Breaker, Bulkhead, Retry - Failure-management and graceful-degradation patterns for each workflow step.