System Design Space
Knowledge graphSettings

Updated: March 25, 2026 at 3:00 AM

Leader Election: patterns and implementations

medium

How to design leader election: leases, quorum, failover, split-brain protection and practical implementations on Raft/ZooKeeper/etcd/Kubernetes.

Leader election becomes a real architecture problem the moment a system needs not just an active instance, but exactly one coordinator it can actually trust.

In real operations, this chapter helps choose between leases, quorum-based approaches, and coordination systems with explicit awareness of failover time, flapping sensitivity, and the cost of false promotion.

In interviews and design reviews, it is especially useful when you can speak clearly about stale leaders, split brain, and dual-writer conflicts instead of hiding them behind the word election.

Practical value of this chapter

Design in practice

Guides election strategy choice by workload shape and failover-time requirements.

Decision quality

Helps tune lease/heartbeat parameters to reduce flapping and false failovers.

Interview articulation

Supports single-leader vs multi-leader reasoning with consistency and ops-cost trade-offs.

Risk and trade-offs

Makes split-brain, stale-leader, and dual-writer risks explicit.

Context

Consensus: Paxos and Raft

Leader election is an application layer on top of consensus/coordination mechanisms.

Open chapter

Leader election needed when the system needs a single active coordinator. Practice shows: choosing a leader in itself does not solve the problem if split-brain, fencing and correct failover semantics are not closed.

When you need a leader

  • Single-writer operations (for example, scheduler, allocator, ownership map).
  • Control of background tasks, where only one active coordinator should work.
  • Failover stateful components and management of leader-only actions.
  • Distributed lock/lease scripts in the control plane.

Election mechanisms

Lease-based election

The leader holds the lease and regularly renews it. When the lease expires, another candidate can become the leader.

Clock skew and network jitter can cause split-brain without fencing tokens.

Consensus-based election (Raft/Paxos)

The leader is selected through quorum and term/version semantics. The most reliable path for critical systems.

Complexity of implementation and operational requirements for quorum health.

Coordination-service election

Leader election via ZooKeeper/etcd/Consul primitives (ephemeral nodes, compare-and-swap, distributed locks).

Dependence on the availability of the coordination service and the correct configuration of timeouts.

Time

Clock Synchronization

Lease-based leadership without time discipline often leads to split-brain.

Open chapter

Practical implementations

Raft

Election timeout + majority vote + term. The de facto standard for many control plane systems.

ZooKeeper

Ephemeral sequential znode pattern: the youngest znode becomes the leader.

etcd

Leases + Compare-And-Swap + lock API. Often used for leadership in cloud-native systems.

Kubernetes

Lease objects in the coordination API for controller leader election.

Split-brain protection

Fencing tokens: each new leader receives a monotonically increasing token to protect the downstream write path.

Leader-only operations check term/token before executing side effects.

Stale leader detection: heartbeat + session expiry + fast revoke.

Read-only/degraded mode when quorum is lost instead of unsafe dual-writer behavior.

Practical checklist

  • The system explicitly defines leader-only and follower-safe operations.
  • There is a guaranteed way to prevent split-brain side effects (fencing/version checks).
  • Election-timeouts are consistent with real network/GC characteristics.
  • There are tests for partition, delayed packets, process pause/restart.
  • Failover is checked regularly through game day scenarios.

A frequent anti-pattern: there is an election, but no fencing - and the system still does dual writes.

References

Related chapters

Enable tracking in Settings