Distributed Transaction Handling
In a distributed system, ensuring data consistency across multiple, independent services is one of the hardest engineering challenges.
A transaction is a group of operations executed as a single, indivisible unit. The golden rule of transactions is “All or Nothing”. If transferring money involves debiting Account A and crediting Account B, both must succeed, or both must fail.
Monolithic vs Microservice Transactions
Section titled “Monolithic vs Microservice Transactions”The complexity of transaction management directly correlates with your system’s architecture.
Monolithic Architecture
Section titled “Monolithic Architecture”
In a monolith, all modules share a single database, allowing the use of standard ACID transactions.
In a monolith, you rely on the database’s native transaction capabilities. The database guarantees ACID properties (Atomicity, Consistency, Isolation, Durability). If a step fails, you simply issue a ROLLBACK command, and the database magically undoes everything.
Microservices Architecture
Section titled “Microservices Architecture”
In a microservices architecture, operations span multiple independent databases.
In microservices, the “Order,” “Payment,” and “Inventory” logic live in different applications, each with its own database. A single business transaction now spans across the network.
A Distributed Transaction is a transaction that spans multiple physical systems or services, requiring coordination to ensure that all participating databases either commit their changes or roll them back together.
Since you cannot wrap a single START TRANSACTION and COMMIT block around three different REST APIs, you need distributed transaction patterns.
Pattern 1: Two-Phase Commit (2PC)
Section titled “Pattern 1: Two-Phase Commit (2PC)”Two-Phase Commit (2PC) is an algorithm that ensures all services agree to commit or roll back changes simultaneously. It relies on a central Transaction Coordinator.
The Two Phases
Section titled “The Two Phases”
The Coordinator ensures all services are ready in Phase 1 before telling them to commit in Phase 2.
- Phase 1: Prepare Phase
- The coordinator asks all participating services: “Are you ready to commit this transaction?”
- Each service checks its constraints, locks the necessary rows in its database, and replies “Ready” or “Failed”.
- Phase 2: Commit Phase
- If all services replied “Ready,” the coordinator sends a “Commit” command.
- If any service replied “Failed” (or timed out), the coordinator sends a “Rollback” command.
Rollback Scenario
Section titled “Rollback Scenario”
If any service fails during the Prepare phase, the Coordinator aborts the entire transaction.
Pattern 2: The SAGA Pattern
Section titled “Pattern 2: The SAGA Pattern”Because 2PC scales poorly in cloud environments, modern microservices use the SAGA Pattern.
Instead of one massive, distributed transaction, a SAGA breaks the work into a series of local transactions.
A sequence of local transactions. If a later step fails, explicit compensating transactions are triggered.
How it works:
- Service A does its work, commits to its database, and triggers Service B.
- Service B does its work, commits, and triggers Service C.
- If Service C fails, the system cannot issue a standard database
ROLLBACKfor A and B (they already committed). Instead, it must execute Compensating Transactions (e.g., Service B issues a refund, Service A marks the order as cancelled).
This relies on Eventual Consistency—the system might be in a partially completed state for a few seconds before the SAGA finishes or compensates.
SAGA Implementations
Section titled “SAGA Implementations”There are two primary ways to coordinate a SAGA.
1. Choreography (Event-Driven)
Section titled “1. Choreography (Event-Driven)”
Services react to events on a message broker without a central controller.
In choreography, there is no central brain. Services publish domain events to a Message Broker (like Kafka or RabbitMQ), and other services listen and react.
| Pros | Cons |
|---|---|
| No single point of failure. | Very hard to track the state of a transaction. |
| Highly decoupled and scalable. | Complex debugging; requires robust distributed tracing. |
2. Orchestration (Command-Driven)
Section titled “2. Orchestration (Command-Driven)”
*An orchestrator explicitly commands services and manages the transaction state machine.*x
In orchestration, a central service (the Orchestrator) manages the workflow. It sends commands to services, waits for their replies, and decides the next step (or triggers compensations).
| Pros | Cons |
|---|---|
| Clear visibility into the transaction state. | The orchestrator is a single point of failure/bottleneck. |
| Easy to understand and debug the workflow. | Couples services slightly more to the orchestrator. |
2PC vs SAGA Comparison
Section titled “2PC vs SAGA Comparison”| Aspect | 2PC | SAGA |
|---|---|---|
| Consistency | Strong (ACID across all nodes) | Eventual (Base) |
| Performance | Slower (Locks resources, blocking) | Faster (Non-blocking local commits) |
| Failure Handling | Standard Database Rollback | Application-level Compensating Transactions |
| Best For | Short transactions, internal networks, single-vendor databases. | Long-running workflows, public cloud, diverse microservices. |
Summary (TL;DR)
Section titled “Summary (TL;DR)”- Monoliths rely on standard ACID database transactions.
- Microservices require distributed transaction patterns because data is split across multiple databases.
- 2PC guarantees strong consistency but blocks resources and scales poorly.
- SAGA breaks work into local transactions and uses compensating actions to undo work, favoring high availability and eventual consistency.