Database Replication
In modern distributed systems, relying on a single database server is a recipe for disaster. Database replication is the process of keeping multiple copies of the same data across different servers or geographical locations.
A typical setup where a Primary database handles all writes and replicates data to Read-Only replicas.
Why Replicate Data?
Section titled “Why Replicate Data?”Replication solves several core engineering challenges:
- High Availability (HA): If one server crashes or a data center loses power, other nodes contain a full copy of the data and can take over instantly.
- Read Scalability: By routing read-heavy operations (like loading user profiles) to replicas, you significantly reduce the load on the primary database.
- Reduced Latency: You can place replicas geographically closer to your users (e.g., an EU replica for European users) to speed up read queries.
- Analytics & Backups: Complex, long-running analytics queries can be run on a dedicated replica without slowing down the primary transactional database.
Core Terminology
Section titled “Core Terminology”Before diving into architectures, it’s crucial to understand the roles within a replication cluster:
Leader (Primary / Master): The designated node that accepts all write operations from the application. It dictates the order of operations and propagates them to followers.
Follower (Replica / Secondary / Slave): Nodes that receive data changes from the Leader. They apply these changes to their local storage and usually only accept read queries.
Replication Lag: The time delay between a write committing on the Leader and that same write being applied on a Follower. In a healthy system, this is usually milliseconds, but it can spike under heavy load.
Commit Strategies: Sync vs Async
Section titled “Commit Strategies: Sync vs Async”When a user writes data, how does the leader ensure the followers get it? This introduces the trade-off between Synchronous and Asynchronous replication.
Timeline comparing the round-trips required for synchronous versus asynchronous replication.
| Strategy | Commit Timing | Pros | Cons |
|---|---|---|---|
| Synchronous | Leader waits for replicas to acknowledge the write before responding to the client. | Strong consistency. Replicas are guaranteed to have the latest data. | Higher latency. The leader blocks if a replica is slow or network fails. |
| Asynchronous | Leader commits locally, replies to client immediately, and updates replicas in the background. | Low latency. Extremely fast writes, resilient to slow followers. | Data loss risk. If the leader crashes before replicating, recent writes are permanently lost. |
Synchronous, semi-synchronous, and asynchronous replication at a glance.
| Mode | Consistency / durability | Write latency | Data loss risk (leader crash) |
|---|---|---|---|
| Synchronous | Highest | Highest | Lowest |
| Semi-synchronous | Medium–high | Medium | Low–medium |
| Asynchronous | Eventual across replicas | Lowest | Highest |
Replication Architectures
Section titled “Replication Architectures”The way nodes communicate defines the replication architecture. There are three primary patterns used in the industry today.
1. Single-Leader Architecture
Section titled “1. Single-Leader Architecture”
One leader accepts writes and ships an ordered replication log; followers apply changes and may serve reads.
This is the most common and easiest to understand. All writes go through one central node. The leader produces an ordered replication log (for example binlog or WAL); followers receive or pull that log and apply updates in order—serving read traffic when your consistency requirements allow.
- Use Case: Read-heavy applications like blogs, news sites, or standard web platforms.
- Failover: If the leader crashes, the system must perform a failover. It selects the most up-to-date follower and promotes it to be the new leader. Clients are then re-routed to the new leader.
In single-leader setups, when the leader dies, a follower is promoted to take its place.
Replica failure
Section titled “Replica failure”- Replica reconnects to the leader.
- It requests missing log entries from its last known position.
- It catches up from checkpoint or binlog/WAL offset.
Leader failure
Section titled “Leader failure”- Detect loss of leader (heartbeats, health checks, timeouts).
- Promote the most up-to-date replica (avoiding a stale promotion that loses acknowledged writes).
- Re-point other replicas and application writers to the new leader.
- Fence the old leader (STONITH, revoke credentials, load balancer drain) to prevent split-brain and dual writers.
Risks to plan for: promoting a lagging replica (data loss), split-brain if two nodes believe they are primary, and long outages if failover is manual or poorly automated.
Adding new replicas safely
Section titled “Adding new replicas safely”- Take a consistent snapshot from the leader (or a replica caught up to a known log position).
- Restore that snapshot on the new node.
- Start replication from the exact log position matching the snapshot.
- Serve reads from the new replica only once lag is within your SLO.
Operational metrics
Section titled “Operational metrics”Track continuously: replication lag (time and bytes), apply throughput, failover duration, leader write latency, and breaches of replica freshness SLOs.
Single-leader practical guidance
Section titled “Single-leader practical guidance”- Send strongly consistent reads to the leader when the business logic cannot tolerate stale data.
- Send stale-tolerant reads to replicas to offload the primary.
- Automate failover with fencing or lease checks; rehearse planned and unplanned failovers.
- Treat backups and PITR as separate from replication—they solve different failure classes.
2. Multi-Leader Architecture
Section titled “2. Multi-Leader Architecture”
Multiple leaders handle traffic independently, typically across different regions, and sync with each other.
In a multi-leader setup, multiple nodes accept writes simultaneously and continuously replicate changes among themselves.
- Use Case: Global applications where users in the US write to a US leader, and users in the EU write to an EU leader, ensuring low write latency globally. It also excels for offline operation support (like collaborative document editing).
3. Leaderless Architecture (Quorum)
Section titled “3. Leaderless Architecture (Quorum)”
Leaderless systems rely on reading and writing to multiple nodes (a quorum) to ensure consistency.
Pioneered by Amazon’s Dynamo system (and used in Cassandra and Riak), leaderless architectures abandon the concept of a primary node. Clients send write and read requests to multiple nodes in parallel.
How It Works: The Quorum Formula
Section titled “How It Works: The Quorum Formula”In a leaderless system, three parameters govern consistency:
| Parameter | Description |
|---|---|
| N | Total number of replicas storing each piece of data |
| W | Write quorum — minimum nodes that must acknowledge a write |
| R | Read quorum — minimum nodes queried for a read |
The Key Rule: If W + R > N, you guarantee strong consistency because at least one node in the read set must have the latest write. This is called quorum-based replication.
Example: N=3, W=2, R=2
Section titled “Example: N=3, W=2, R=2”Write Flow:
- Client sends a write request to a coordinator node
- Coordinator forwards the write to all 3 replicas
- As soon as 2 replicas acknowledge → write succeeds
- The third replica catches up asynchronously (anti-entropy)
Read Flow:
- Client sends a read request to the coordinator
- Coordinator queries 2 replicas in parallel
- If values differ → resolve conflict (e.g., pick latest timestamp)
- Return the most recent version
Since W + R = 2 + 2 = 4 > 3, at least one overlapping node is guaranteed to have the latest data.
Tuning Quorums for Different Needs
Section titled “Tuning Quorums for Different Needs”| Configuration | Trade-off |
|---|---|
W=N, R=1 | Writes are slow (all nodes), reads are fast |
W=1, R=N | Writes are fast, reads are slow (query all) |
W=1, R=1 | Maximum speed, but no consistency guarantee |
Use Case: Extremely high-availability systems where downtime is unacceptable, such as shopping carts, session stores, or massive event logging.
Summary (TL;DR)
Section titled “Summary (TL;DR)”| Architecture | Best For | Trade-off |
|---|---|---|
| Single-Leader | Read-heavy workloads (blogs, web apps) | Simple, but requires failover mechanisms |
| Multi-Leader | Global write performance, offline support | Conflict resolution complexity |
| Leaderless | Maximum availability (carts, sessions) | Eventual consistency, tunable via quorum |
Key Takeaway: Choose your replication strategy based on your consistency vs. availability requirements. Use W + R > N in leaderless systems to guarantee overlap between reads and writes.