System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 11:59 PM

Leader Election: patterns and implementations

mid

How to design leader election: leases, quorum, failover, split-brain protection and practical implementations on Raft/ZooKeeper/etcd/Kubernetes.

Context

Consensus: Paxos and Raft

Leader election is an application layer on top of consensus/coordination mechanisms.

Open chapter

Leader election needed when the system needs a single active coordinator. Practice shows: choosing a leader in itself does not solve the problem if split-brain, fencing and correct failover semantics are not closed.

When you need a leader

  • Single-writer operations (for example, scheduler, allocator, ownership map).
  • Control of background tasks, where only one active coordinator should work.
  • Failover stateful components and management of leader-only actions.
  • Distributed lock/lease scripts in the control plane.

Election mechanisms

Lease-based election

The leader holds the lease and regularly renews it. When the lease expires, another candidate can become the leader.

Clock skew and network jitter can cause split-brain without fencing tokens.

Consensus-based election (Raft/Paxos)

The leader is selected through quorum and term/version semantics. The most reliable path for critical systems.

Complexity of implementation and operational requirements for quorum health.

Coordination-service election

Leader election via ZooKeeper/etcd/Consul primitives (ephemeral nodes, compare-and-swap, distributed locks).

Dependence on the availability of the coordination service and the correct configuration of timeouts.

Time

Clock Synchronization

Lease-based leadership without time discipline often leads to split-brain.

Open chapter

Practical implementations

Raft

Election timeout + majority vote + term. The de facto standard for many control plane systems.

ZooKeeper

Ephemeral sequential znode pattern: the youngest znode becomes the leader.

etcd

Leases + Compare-And-Swap + lock API. Often used for leadership in cloud-native systems.

Kubernetes

Lease objects in the coordination API for controller leader election.

Split-brain protection

Fencing tokens: each new leader receives a monotonically increasing token to protect the downstream write path.

Leader-only operations check term/token before executing side effects.

Stale leader detection: heartbeat + session expiry + fast revoke.

Read-only/degraded mode when quorum is lost instead of unsafe dual-writer behavior.

Practical checklist

  • The system explicitly defines leader-only and follower-safe operations.
  • There is a guaranteed way to prevent split-brain side effects (fencing/version checks).
  • Election-timeouts are consistent with real network/GC characteristics.
  • There are tests for partition, delayed packets, process pause/restart.
  • Failover is checked regularly through game day scenarios.

A frequent anti-pattern: there is an election, but no fencing - and the system still does dual writes.

References

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov