System Design Space
Knowledge graphSettings

Updated: March 15, 2026 at 7:32 PM

Workflow Orchestration: Temporal, Cadence, Step Functions

medium

How to design long-running business processes in microservices: durable execution, retries/compensation, stateful workflows, and platform trade-offs between Temporal, Cadence, and AWS Step Functions.

This Theme 9 chapter focuses on workflow orchestration, compensation, and idempotent step handling.

In real-world design, this material helps drive decisions using measurable constraints: latency budget, blast radius, contract stability, and integration operating cost.

For system design interviews, it provides a clear narrative: why this approach was chosen, which alternatives were considered, and which operational risks must be made explicit.

Practical value of this chapter

Design in practice

Design long-running processes with explicit compensation steps and state ownership.

Decision quality

Compare orchestration and choreography by control visibility and evolution complexity.

Interview articulation

Frame saga answers via happy path, failure path, and recovery policy.

Failure framing

Set timeout/retry limits so workflows do not hang or duplicate side effects.

Primary source

Temporal Workflows

Core model for durable execution and workflow semantics.

Open documentation

Workflow orchestration is an architectural layer for coordinating long-running business processes across microservices. It centralizes process state, retry/timeout policies, compensation logic, and runtime operational control.

Signals That Orchestration Is Actually Needed

The process runs for minutes, hours, or days

When a business flow outlives a single HTTP request, you need durable state and safe continuation after failures.

You need compensations and rollback scenarios

If steps touch multiple services, an orchestrator simplifies Saga execution: explicit compensations, rollback order, and transparent history.

Standardized retry/timeout policies are required

A shared policy for retries, backoff, and deadlines removes duplicated infrastructure logic from each microservice.

Operational control matters

You need replay, manual step restart, pause/resume, audit, and workflow state metrics in one operational plane.

Temporal, Cadence, Step Functions: Practical Comparison

PlatformState modelAuthoring modelRetry/timeoutTrade-offs
TemporalDurable execution + event historyCode-first workflows in SDKs (Go/Java/TS/...)Retry policies on activity/workflow tasks + timersRequires deterministic coding discipline and a dedicated ops plane.
CadenceDurable execution (architecturally close to Temporal)Code-first workflows in SDKsActivity retry policies + domain-level controlsOften chosen in existing installations and migration paths.
AWS Step FunctionsManaged state machine (ASL, visual states)Declarative state machines + AWS integrationsRetry/Catch per stateStrong AWS integration with a higher vendor lock-in risk.

Reference Flow With Compensations

Typical order flow: reserve inventory, charge payment, create shipment, send confirmation. If a step fails, compensations run in reverse order.

Reference Orchestration Process

Happy path and Saga compensations in a single visual flow.

Success pathCompensation pathFailure in createShipmentOrder receivedworkflow startedreserveInventory()reserve itemschargePayment()charge customercreateShipment()prepare shipmentsendConfirmation()notify customerCompletedworkflow doneStep failureshipment creation failedrefundPayment()reverse paymentreleaseInventory()release reservationRolled Backsaga compensated
Successful pathCompensationsFailure point
export async function OrderWorkflow(input: OrderInput): Promise<void> {
  const reservation = await reserveInventory(input.orderId, input.items);

  try {
    await chargePayment(input.orderId, input.amount);
    await createShipment(input.orderId, reservation.warehouseId);
    await sendConfirmation(input.orderId);
  } catch (error) {
    await refundPayment(input.orderId);
    await releaseInventory(input.orderId);
    throw error;
  }
}

Execution Contract and Reliability Checklist

Execution contract

  • Every activity is idempotent: re-execution must not corrupt business state.
  • Every external call and the overall workflow has explicit timeout/deadline boundaries.
  • Compensations are business-valid reverse actions, not only technical rollbacks.
  • Workflow logic is versioned so running instances can finish on older behavior safely.
  • Every workflow state is observable through metrics and tracing.

Reliability checklist

  • Every workflow has a stable business key (for example, `orderId`) and a dedup policy.
  • Activities avoid hidden nondeterministic behavior unless wrapped in side-effect primitives.
  • Errors are split into retryable vs non-retryable with distinct handling policies.
  • Manual operations (`resume`, `terminate`, `retry from failed step`) are documented as runbooks.
  • Orchestration SLO is measured separately: start latency, completion latency, failure rate.

Implementation Risks

Mixing business logic with transport details

Keep workflows as a coordination layer; move domain logic and external integration details to activity/handler layers.

Implicit compensations

Define compensations next to each step and test them separately with fault-injection scenarios.

One giant workflow

Split flows into subprocesses with clear inputs/outputs and explicit bounded-context ownership.

Insufficient observability

Publish metrics for step status, retry depth, queue lag, and time-to-completion.

References

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov