Last Updated: January 15, 2026
In a monolithic application, handling transactions is straightforward. You start a transaction, perform multiple database operations, and either commit everything or roll it all back. The database guarantees ACID properties: atomicity, consistency, isolation, and durability. If anything fails, the whole operation is undone as if it never happened.
Distributed systems break this simplicity.
When your application is split across multiple services, each with its own database, a single business operation might span several services. Placing an order might involve the Order Service creating an order record, the Inventory Service reserving items, and the Payment Service charging the customer. Each service manages its own database. There is no single transaction that can wrap all these operations together.
This creates a fundamental problem: how do you ensure that either all services complete their part successfully, or none of them do? If the Payment Service charges the customer but the Inventory Service fails to reserve items, you have an inconsistent state where money was taken but no products are allocated.
In this chapter, you will learn:
Understanding the problem deeply is essential before exploring solutions. The constraints of distributed systems shape every pattern we will discuss in the following chapters.
In a single database, transactions are elegant and reliable.
The database guarantees ACID properties:
| Property | Guarantee |
|---|---|
| Atomicity | All operations succeed or all fail together |
| Consistency | Database moves from one valid state to another |
| Isolation | Concurrent transactions do not interfere |
| Durability | Committed changes survive crashes |
This works because the database has complete control over all the data involved. It can lock rows, track changes, and coordinate the commit or rollback of the entire transaction.
When operations span multiple services and databases, we lose the properties that make local transactions reliable.
Each service has its own database. There is no single transaction manager that controls all three databases. Each database only knows about its own local transaction.
Consider what happens when placing an order:
If any step fails after previous steps have committed, we have a problem. The Order Service has already committed its transaction. The Inventory Service has already committed its reservation. There is no way to roll them back.
A natural question: why not put everything in one database? Then we could use normal transactions.
Reasons against a shared database:
| Reason | Explanation |
|---|---|
| Independent scaling | Services have different load patterns |
| Technology freedom | Different services need different databases |
| Team autonomy | Teams own their data and schema |
| Fault isolation | One database failure should not affect all services |
| Performance | Distributed load, no single bottleneck |
The benefits of microservices come from service independence. Sharing a database undermines that independence. We accept the complexity of distributed transactions to gain the benefits of service autonomy.
Distributed systems have more failure modes than local systems. Any component can fail at any time, and the failures are often partial.
When network connectivity is lost between services, Order Service cannot tell if Payment Service is down or just unreachable. Did the payment succeed? Is it still processing? Should we retry?
Two services succeeded and committed their changes. One failed. The system is in an inconsistent state: we have an order and reserved inventory, but no payment.
One of the hardest problems in distributed systems: what does a timeout mean?
| Scenario | What Happened | Our State |
|---|---|---|
| Request never arrived | Payment not processed | Safe to retry |
| Request arrived, processing failed | Payment not processed | Safe to retry |
| Request succeeded, response lost | Payment processed | Retry would double-charge |
Without additional mechanisms, we cannot distinguish these cases. This is why idempotency becomes critical in distributed systems.
When a service crashes mid-operation, its local transaction rolls back automatically. But the calling service does not know what happened.
In local transactions, we get strong consistency. In distributed systems, we often have to choose between consistency and availability.
All nodes see the same data at the same time. After a write completes, all readers see the new value.
Trade-off: Higher latency, reduced availability during partitions.
Updates propagate to all nodes eventually, but there may be a delay. During the delay, different nodes may return different values.
Trade-off: Lower latency, higher availability, but temporary inconsistency.
The CAP theorem states that during a network partition, a distributed system must choose between consistency and availability.
Network partitions will happen. The question is how your system behaves when they do. Distributed transaction patterns represent different choices along this spectrum.
Inconsistent state is not just a theoretical problem. It causes real business issues.
Several factors make distributed transactions fundamentally difficult.
Each service has its own clock, and clocks drift. We cannot rely on timestamps to determine event ordering across services.
Services cannot share variables or locks. All coordination must happen through network messages, which can be lost, delayed, or duplicated.
Each service can fail independently. A service might be up for some callers and down for others. It might process some requests and fail on others.
This thought experiment proves that two parties cannot reach certainty about agreement through an unreliable channel. This fundamental limitation affects all distributed coordination.
Given these challenges, what properties should a distributed transaction solution provide?
| Property | Description |
|---|---|
| Atomicity | All services commit or all roll back |
| Consistency | System moves between valid states |
| Isolation | Concurrent operations do not corrupt data |
| Durability | Committed changes persist |
| Failure handling | Graceful recovery from partial failures |
| Performance | Acceptable latency and throughput |
| Availability | System remains operational during failures |
The challenge: we cannot have all properties perfectly. Every solution involves trade-offs. The patterns in the following chapters represent different points in this trade-off space.
We will explore several approaches to distributed transactions, each with different trade-offs:
| Pattern | Consistency | Availability | Complexity | Use Case |
|---|---|---|---|---|
| Two-Phase Commit | Strong | Lower | Medium | Tightly coupled systems |
| Three-Phase Commit | Strong | Better | High | Critical consistency needs |
| Saga | Eventual | High | Medium | Microservices |
| Outbox | Eventual | High | Low | Event-driven systems |
Distributed transactions are fundamentally harder than local transactions:
The patterns in the following chapters address these challenges in different ways. Two-Phase Commit provides strong consistency through coordination. Three-Phase Commit improves on 2PC's availability limitations. Sagas embrace eventual consistency with compensating transactions. The Outbox Pattern ensures reliable event publishing.
In the next chapter, we will explore Two-Phase Commit, the classic protocol for achieving atomicity across distributed systems.