System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 11:59 PM

Clock Synchronization in Distributed Systems

mid

Practice of time synchronization: physical vs logical clocks, NTP/PTP, clock skew impact and architectural protection against time drift.

Context

Distributed Systems: Overview

Clock semantics are the foundation for consistency, coordination and observability in distributed systems.

Open chapter

Clock synchronization - this is not only “the exact time on the servers”, but an architectural factor that affects consistency, retry/timeout behavior and even security. The more distributed the system, the higher the cost of errors in time assumptions.

Why is this important

  • Event ordering and correct replay in event-driven systems.
  • TTL/lease mechanics in cache, lock services and service discovery.
  • Correct deadlines and timeout budgets in RPC/queue processing.
  • Security: token expiration date, replay-window and anti-replay checks.
  • Audit and investigation of incidents where sequence of actions is important.

Time models

Physical clocks

Real time (UTC/NTP/PTP). Needed for business-time and compliance logic, but there is skew/drift.

Logical clocks

Lamport/Vector clocks for cause-and-effect order without assumptions about wall-clock accuracy.

Hybrid logical clocks (HLC)

Combination of physical + logical time: useful for distributed DB and snapshot operations.

Related

Consensus

Leader timeouts and lease-based mechanics depend on correct time behavior.

Open chapter

Synchronization approaches

NTP

When: Basic standard for most general purpose systems.

Restrictions: Accuracy is typically milliseconds; offset/jitter and fallback monitoring to multiple time sources is required.

PTP

When: When high accuracy (below milliseconds) is needed, for example trading/telecom/industrial circuits.

Restrictions: Requires network and hardware support; more difficult to operate.

Application-level ordering

When: If wall-clock is unreliable for business invariants, use sequence/causal ordering in the application.

Restrictions: You can't rely entirely on timestamps for strict ordering of operations.

Related

Lesley Lamport: Causality, Paxos and Engineering Thinking

Interview on causality, logical clocks, and Lamport's engineering approach to distributed systems.

Open chapter

Design patterns

Use monotonic clock to measure durations, and wall-clock only for display/business time.

For critical write-paths, enter server-assigned timestamp or sequence number.

Add an uncertainty window when comparing timestamps from different nodes.

Check and alert by clock offset; remove nodes with large drift from quorum.

Don't make security dependent on client time alone.

Practical checklist

  • Time offset/jitter metrics are visible for all production weeks.
  • There is a runbook in case of massive clock drift and time-source failure.
  • Timestamp logic is tested with artificial skew in integration/chaos tests.
  • Services do not use wall-clock for SLA timeout measurements.
  • Critical transactions have an independent ordering mechanism in addition to the wall-clock.

Frequent anti-pattern: use wall-clock timestamp as the only source of event order.

References

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov