AlgoMaster Logo

The Problem with Distributed Transactions

Last Updated: January 15, 2026

Ashish

Ashish Pratap Singh

In a monolithic application, handling transactions is straightforward. You start a transaction, perform multiple database operations, and either commit everything or roll it all back. The database guarantees ACID properties: atomicity, consistency, isolation, and durability. If anything fails, the whole operation is undone as if it never happened.

Distributed systems break this simplicity.

When your application is split across multiple services, each with its own database, a single business operation might span several services. Placing an order might involve the Order Service creating an order record, the Inventory Service reserving items, and the Payment Service charging the customer. Each service manages its own database. There is no single transaction that can wrap all these operations together.

This creates a fundamental problem: how do you ensure that either all services complete their part successfully, or none of them do? If the Payment Service charges the customer but the Inventory Service fails to reserve items, you have an inconsistent state where money was taken but no products are allocated.

In this chapter, you will learn:

  • Why traditional transactions do not work across services
  • The consistency challenges in distributed systems
  • Common failure scenarios and their consequences
  • Why this problem is harder than it appears

Understanding the problem deeply is essential before exploring solutions. The constraints of distributed systems shape every pattern we will discuss in the following chapters.

The Simplicity of Local Transactions

In a single database, transactions are elegant and reliable.

The database guarantees ACID properties:

PropertyGuarantee
AtomicityAll operations succeed or all fail together
ConsistencyDatabase moves from one valid state to another
IsolationConcurrent transactions do not interfere
DurabilityCommitted changes survive crashes

This works because the database has complete control over all the data involved. It can lock rows, track changes, and coordinate the commit or rollback of the entire transaction.

Why Distributed Transactions Are Different

When operations span multiple services and databases, we lose the properties that make local transactions reliable.

Each service has its own database. There is no single transaction manager that controls all three databases. Each database only knows about its own local transaction.

The Fundamental Challenge

Consider what happens when placing an order:

If any step fails after previous steps have committed, we have a problem. The Order Service has already committed its transaction. The Inventory Service has already committed its reservation. There is no way to roll them back.

Why We Cannot Just Use One Database

A natural question: why not put everything in one database? Then we could use normal transactions.

Reasons against a shared database:

ReasonExplanation
Independent scalingServices have different load patterns
Technology freedomDifferent services need different databases
Team autonomyTeams own their data and schema
Fault isolationOne database failure should not affect all services
PerformanceDistributed load, no single bottleneck

The benefits of microservices come from service independence. Sharing a database undermines that independence. We accept the complexity of distributed transactions to gain the benefits of service autonomy.

Failure Scenarios

Distributed systems have more failure modes than local systems. Any component can fail at any time, and the failures are often partial.

Network Partitions

When network connectivity is lost between services, Order Service cannot tell if Payment Service is down or just unreachable. Did the payment succeed? Is it still processing? Should we retry?

Partial Failures

Two services succeeded and committed their changes. One failed. The system is in an inconsistent state: we have an order and reserved inventory, but no payment.

Timeout Ambiguity

One of the hardest problems in distributed systems: what does a timeout mean?

ScenarioWhat HappenedOur State
Request never arrivedPayment not processedSafe to retry
Request arrived, processing failedPayment not processedSafe to retry
Request succeeded, response lostPayment processedRetry would double-charge

Without additional mechanisms, we cannot distinguish these cases. This is why idempotency becomes critical in distributed systems.

Service Crashes

When a service crashes mid-operation, its local transaction rolls back automatically. But the calling service does not know what happened.

Consistency Models in Distributed Systems

In local transactions, we get strong consistency. In distributed systems, we often have to choose between consistency and availability.

Strong Consistency

All nodes see the same data at the same time. After a write completes, all readers see the new value.

Trade-off: Higher latency, reduced availability during partitions.

Eventual Consistency

Updates propagate to all nodes eventually, but there may be a delay. During the delay, different nodes may return different values.

Trade-off: Lower latency, higher availability, but temporary inconsistency.

The CAP Theorem Connection

The CAP theorem states that during a network partition, a distributed system must choose between consistency and availability.

Network partitions will happen. The question is how your system behaves when they do. Distributed transaction patterns represent different choices along this spectrum.

Real-World Consequences of Inconsistency

Inconsistent state is not just a theoretical problem. It causes real business issues.

Double Charging

Lost Orders

Orphaned Reservations

Financial Discrepancies

Why This Problem Is Hard

Several factors make distributed transactions fundamentally difficult.

No Global Clock

Each service has its own clock, and clocks drift. We cannot rely on timestamps to determine event ordering across services.

No Shared Memory

Services cannot share variables or locks. All coordination must happen through network messages, which can be lost, delayed, or duplicated.

Independent Failures

Each service can fail independently. A service might be up for some callers and down for others. It might process some requests and fail on others.

The Two Generals Problem

This thought experiment proves that two parties cannot reach certainty about agreement through an unreliable channel. This fundamental limitation affects all distributed coordination.

What We Need from a Solution

Given these challenges, what properties should a distributed transaction solution provide?

PropertyDescription
AtomicityAll services commit or all roll back
ConsistencySystem moves between valid states
IsolationConcurrent operations do not corrupt data
DurabilityCommitted changes persist
Failure handlingGraceful recovery from partial failures
PerformanceAcceptable latency and throughput
AvailabilitySystem remains operational during failures

The challenge: we cannot have all properties perfectly. Every solution involves trade-offs. The patterns in the following chapters represent different points in this trade-off space.

Preview of Solutions

We will explore several approaches to distributed transactions, each with different trade-offs:

PatternConsistencyAvailabilityComplexityUse Case
Two-Phase CommitStrongLowerMediumTightly coupled systems
Three-Phase CommitStrongBetterHighCritical consistency needs
SagaEventualHighMediumMicroservices
OutboxEventualHighLowEvent-driven systems

Summary

Distributed transactions are fundamentally harder than local transactions:

  • Local transactions benefit from a single transaction manager that controls all data, providing ACID guarantees automatically.
  • Distributed systems have no single authority. Each service owns its data, and there is no built-in way to coordinate commits across services.
  • Failures are partial. Some services might succeed while others fail, leaving the system in an inconsistent state.
  • Networks are unreliable. Messages can be lost, delayed, or duplicated. Timeouts are ambiguous.
  • Clocks are unreliable. We cannot use timestamps to definitively order events across services.
  • Trade-offs are unavoidable. The CAP theorem means we must choose between consistency and availability during partitions.

The patterns in the following chapters address these challenges in different ways. Two-Phase Commit provides strong consistency through coordination. Three-Phase Commit improves on 2PC's availability limitations. Sagas embrace eventual consistency with compensating transactions. The Outbox Pattern ensures reliable event publishing.

In the next chapter, we will explore Two-Phase Commit, the classic protocol for achieving atomicity across distributed systems.