Skip to content

👑 Leader Election Patterns

Overview

Leader handles coordination, sequencing, or writes; followers replicate or execute delegated work.
System must detect failures, choose a new leader, and resume service quickly.
Eliminates ambiguity but introduces a logical single point of control.

When to Apply

Write serialization: databases or logs that require ordered commits.
Task orchestration: scheduler coordinating workers (e.g., MapReduce master).
Cluster membership: services needing one spokesperson for external clients.

Core Algorithm Options

Approach	How It Works	Strengths	Trade-offs
Bully	Highest-ID node wins; others concede	Simple, no extra services	O(n²) messaging; sensitive to churn
Paxos	Consensus on proposals via majority voting	Proven safety, tolerant to failures	Hard to implement; latency overhead
Raft	Log replication with randomized elections	Easier mental model; widely adopted	Requires persistent logs; leader bottleneck
ZooKeeper/etcd (ZAB/Raft)	External quorum service grants leadership via ephemeral nodes	Battle-tested, provides watches	Needs dedicated cluster; adds dependency

Election & Failover Flow

Detect: followers miss heartbeats or lease expiry.
Nominate: eligible nodes campaign using algorithm rules.
Vote/Agree: majority consensus or deterministic winner.
Promote: new leader replays logs, announces leadership.
Recover: old leader steps down when it regains connectivity.

Operational Considerations

Tune election timeouts to balance prompt failover against false positives.
Maintain durable state (term/epoch, log index) across restarts.
Emit metrics on election frequency, log lag, and leadership duration.
Run chaos drills (kill leader, isolate network) to validate recovery.

Example Pairings

Primary/replica databases: Raft or Paxos to elect a single writer.
Distributed locks: ZooKeeper ephemeral znodes for leadership leases.
Partitioned systems: one leader per shard to scale out horizontally.

Pros & Cons Summary

✅ Simplifies coordination, enforces ordering, supports strong consistency.
❌ Leader can become throughput bottleneck; election overhead adds latency; misconfigured failover risks downtime.