System Design Space
Knowledge graphSettings

Updated: March 25, 2026 at 5:53 PM

Interplanetary Distributed Computing System

medium

Classic task: delay-tolerant networking, store-and-forward transport, autonomous nodes, and eventual synchronization.

An interplanetary distributed system breaks the usual internet assumptions: latency is huge, connectivity windows are rare, and partitions are the default rather than the exception.

The case helps design autonomous nodes, store-and-forward messaging, deterministic reconciliation, and scheduling around contact windows and scarce resources.

For interviews and architecture reviews, it is useful because it forces you to revisit every hidden assumption about networking, time, and coordination.

Delay Tolerance

The architecture must handle high latency and asynchronous delivery by default.

Autonomous Nodes

Nodes should continue local decisions when central connectivity is unavailable.

Eventual Sync

Define deterministic reconciliation and conflict-resolution after reconnection.

Resource Constraints

Design for constrained compute, energy, and network budgets in edge environments.

Hacking SDI

Practice case from chapter 15

Interplanetary Distributed Computing System as an extreme-latency architecture exercise.

Читать обзор

Interplanetary Distributed Computing System is an edge-case interview scenario that tests architecture thinking under hard physical constraints. Synchronous patterns are mostly unusable here, so the core design relies on delay-tolerant networking, autonomous nodes, and eventual convergence.

Functional requirements

  • Reliable task and command delivery between nodes under long network delays.
  • Local execution autonomy during complete disconnection from central control.
  • Store-and-forward transport with acknowledgements, retries, and deduplication.
  • Batch state synchronization during intermittent communication windows.

Non-functional requirements

  • Latency tolerance from minutes to hours across network segments.
  • Resilience to prolonged partitions and channel outages.
  • Graceful degradation: no hard dependency on always-on central connectivity.
  • Strong observability for delayed delivery and post-factum debugging.

High-Level Architecture

Theory

Distributed Message Queue

Store-and-forward, retry, ordering, and delivery semantics in asynchronous systems.

Читать обзор

High-Level Architecture

command bundles -> autonomous edge execution -> sync window reconciliation

This topology separates dispatch path, autonomous edge execution, and sync/reconcile loop.

Mission Control
global operations
Policy Engine
priority + TTL
Relay Network
store-and-forward
Orbital Gateway
window ingress
Edge Cluster
isolated domain
Local Planner
task sequencing
Execution Workers
idempotent runs
Local Event Log
append-only
Result Bundler
delta packaging
Sync Uplink
window transfer
Conflict Resolver
merge rules
Archive Store
canonical timeline

The architecture separates dispatch, autonomous execution, and sync/reconcile loops so the system remains operable under long partitions and intermittent communication windows.

Write/Read Paths

Write/Read Paths

How command bundles are written and how results/state are read and synchronized under extreme latency.

Write path: control center builds command bundles, transfers them via delay-tolerant relay, and edge persists commands into local log.

Command Bundle

Layer 1

mission control

Control center prepares command batch with priority, TTL, and safety policy.

Policy Gate

Layer 2

validate + sign

Policy engine validates and signs bundle before transfer.

Relay Network

Layer 3

store-and-forward

Commands are transferred via delay-tolerant relay with retry and dedup.

Edge Queue

Layer 4

local ingest

Orbital/edge gateway receives bundle and puts it into local queue.

Local Event Log

Layer 5

durable append

Command is appended to durable local log for autonomous execution and replay.

Write path checkpoints

  • Commands should include idempotency key, priority lane, and TTL.
  • Store-and-forward is mandatory because delivery can take minutes to hours.
  • Local append-only log is required for safe replay after failures.

What to clarify in the interview

  • Which operations must be online-only versus fully local-capable.
  • Maximum acceptable synchronization lag for each data class.
  • Conflict-resolution policy for concurrent offline updates.
  • Safety-critical workflows and how emergency stop/override works.

Common mistakes

  • Modeling interplanetary communication as normal low-latency RPC.
  • Skipping autonomous local mode for disconnected operation.
  • No explicit merge/conflict policy for delayed bi-directional updates.
  • Ignoring retransmission and bandwidth costs for large payloads.

This scenario is rarely asked literally, but it is a strong test of engineering maturity: adapting architecture to hard environment constraints instead of defaulting to cloud-era assumptions.

Related chapters

Enable tracking in Settings