Skip to content
Dev Dump

Database Replication

Database Replication Overview

  • Availability: Survive server or region failures; fail over to replicas.
  • Read Scalability: Spread read traffic across replicas.
  • Latency: Serve users from geographically close replicas.
  • Backups/Analytics: Offload heavy queries to secondary nodes.
  • Leader (Primary): Accepts writes; propagates changes to replicas.
  • Follower (Replica): Applies writes from leader; typically read-only.
  • Replication Lag: Delay between leader commit and follower applying it.
StrategyCommit TimingProsCons
SynchronousWait for replicas to acknowledge before commitStrong consistencyHigher latency; if replicas fail, writes block
AsynchronousLeader commits immediately, followers update laterLow write latencyRisk of data loss if leader fails before replicas apply changes

Sync vs Async Replication

Synchronous Replication:

  • Client sends write to Primary
  • Primary writes locally and forwards to all secondaries
  • Primary waits for acknowledgment from all secondaries
  • Only then responds to client with “write complete”
  • Trade-off: Slower but guarantees data is on all replicas

Asynchronous Replication:

  • Client sends write to Primary
  • Primary writes locally and immediately responds “write complete”
  • Primary forwards writes to secondaries in the background
  • Secondaries acknowledge independently
  • Trade-off: Faster but risks data loss if primary fails before replication

Semi-Synchronous Replication:

  • A hybrid approach where the leader waits for at least one replica to acknowledge
  • Balances between consistency and performance
  • Common in MySQL with semi-sync replication plugin
  • Full replication: Every node stores complete dataset. High availability; higher storage costs.
  • Partial replication: Each replica stores a subset. Saves space; queries may need multiple replicas.

Single-Leader (Master-Slave / Active-Passive)

Section titled “Single-Leader (Master-Slave / Active-Passive)”

Single-Leader Replication

  • Writes go to leader; followers replicate asynchronously/synchronously.
  • Promote a follower on leader failure.
  • Good for read-heavy workloads.

Multi-Leader (Master-Master / Active-Active)

Section titled “Multi-Leader (Master-Master / Active-Active)”

Multi-Leader Replication

  • Multiple nodes accept writes; replicate changes among leaders.
  • Useful for multi-region write workloads.
  • Requires conflict detection and resolution.

Multi-leader conflict

In the diagram above:

  1. User 1 writes Y to Leader 1, where value was Z
  2. User 2 simultaneously writes X to Leader 2, where value was also Z
  3. Both leaders complete their writes locally
  4. When replicating to each other, conflicts occur:
    • Leader 1 tries to write Y but finds X already there
    • Leader 2 tries to write X but finds Y already there
  • Clients write to multiple replicas directly.
  • Systems like DynamoDB, Cassandra rely on quorum reads/writes.
  • Highly available but complex conflict resolution and consistency tuning.
  • Consistency vs. Availability: CAP theorem applies; async replication + network partitions lead to eventual consistency.
  • Operational Overhead: Monitor lag, handle failovers, keep configs in sync.
  • Write Conflicts: Multi-leader setups need robust reconciliation policies.
  • Monitor replication lag and alert on thresholds.
  • Automate failover; run regular drills.
  • Co-locate replicas close to users but account for legal/compliance requirements.
  • Document conflict resolution rules for multi-leader deployments.
  • Purpose: Replication boosts resilience and read performance at the cost of coordination complexity.
  • Single-leader: Best for read-heavy apps with simple consistency needs.
  • Multi-leader: Best for geo-distributed writes; requires conflict resolution.
  • Leaderless: High availability with eventual consistency; complex tuning.
  • See Distributed Transaction Handling for coordinating writes across services.