👑 Leader Election Patterns
Overview
Section titled “Overview”- Leader handles coordination, sequencing, or writes; followers replicate or execute delegated work.
- System must detect failures, choose a new leader, and resume service quickly.
- Eliminates ambiguity but introduces a logical single point of control.
When to Apply
Section titled “When to Apply”- Write serialization: databases or logs that require ordered commits.
- Task orchestration: scheduler coordinating workers (e.g., MapReduce master).
- Cluster membership: services needing one spokesperson for external clients.
Core Algorithm Options
Section titled “Core Algorithm Options”| Approach | How It Works | Strengths | Trade-offs |
|---|---|---|---|
| Bully | Highest-ID node wins; others concede | Simple, no extra services | O(n²) messaging; sensitive to churn |
| Paxos | Consensus on proposals via majority voting | Proven safety, tolerant to failures | Hard to implement; latency overhead |
| Raft | Log replication with randomized elections | Easier mental model; widely adopted | Requires persistent logs; leader bottleneck |
| ZooKeeper/etcd (ZAB/Raft) | External quorum service grants leadership via ephemeral nodes | Battle-tested, provides watches | Needs dedicated cluster; adds dependency |
Election & Failover Flow
Section titled “Election & Failover Flow”- Detect: followers miss heartbeats or lease expiry.
- Nominate: eligible nodes campaign using algorithm rules.
- Vote/Agree: majority consensus or deterministic winner.
- Promote: new leader replays logs, announces leadership.
- Recover: old leader steps down when it regains connectivity.
Operational Considerations
Section titled “Operational Considerations”- Tune election timeouts to balance prompt failover against false positives.
- Maintain durable state (term/epoch, log index) across restarts.
- Emit metrics on election frequency, log lag, and leadership duration.
- Run chaos drills (kill leader, isolate network) to validate recovery.
Example Pairings
Section titled “Example Pairings”- Primary/replica databases: Raft or Paxos to elect a single writer.
- Distributed locks: ZooKeeper ephemeral znodes for leadership leases.
- Partitioned systems: one leader per shard to scale out horizontally.
Pros & Cons Summary
Section titled “Pros & Cons Summary”- ✅ Simplifies coordination, enforces ordering, supports strong consistency.
- ❌ Leader can become throughput bottleneck; election overhead adds latency; misconfigured failover risks downtime.