Skip to content

2PC vs Saga — When to Use Which#

The core difference in one line#

2PC   →  all services lock and wait together — true atomicity, but blocks under failure
Saga  →  each service commits independently — eventual consistency, never blocks

What "locks" means in 2PC — clearing the misconception#

2PC does not require a shared codebase or shared database. The services are completely independent — separate code, separate deployments, separate databases.

The locks are local to each service's own database:

Payment Service (its own DB)   → locks user row locally
Inventory Service (its own DB) → locks item row locally
Order Service (its own DB)     → locks order row locally

The problem is not that the locks are shared — it's that they are held simultaneously across all services for the entire duration of both phases. While those locks are held, no other transaction in any of those databases can touch those rows.

At low throughput this is fine. At thousands of orders per second, every service's database is blocked waiting for the coordinator. Latency spikes, queues build up, the system slows to a crawl.


Why 2PC breaks across organisations#

2PC requires you to control all participants at the infrastructure level — your services must be able to hold locks inside their databases on your instruction.

Your services (same company):
✓  you own the databases
✓  you can tell them to hold locks and wait
✓  they support XA/2PC protocol

External bank (HDFC, SBI):
✗  you don't own their databases
✗  they will never let you hold locks inside their systems
✗  no external 2PC protocol is exposed

This is why GPay and PhonePay use Saga — not because the code is in separate repos, but because HDFC and SBI are independent organisations whose databases you cannot control.


Real example — UPI payment#

A UPI payment flows through:

GPay → NPCI → Bank A (debit) → NPCI → Bank B (credit)

For 2PC, NPCI would need to hold a lock on your HDFC account AND the recipient's SBI account simultaneously. HDFC and SBI will never allow this. They're independent banks with their own systems.

So NPCI uses Saga instead. If Bank B's credit fails:

Step 1: Debit Bank A        →  committed locally ✓
Step 2: Credit Bank B       →  fails ✗
Step 3: Compensate          →  reverse the debit on Bank A (refund)

The user sees "payment pending" for a few seconds. That brief inconsistency is acceptable. What's not acceptable is permanent money loss — and Saga prevents that via compensation.

The one place 2PC lives in payments is inside a single bank's own database — debiting and crediting two internal accounts is a local ACID transaction, no distributed coordination needed.


Full comparison#

2PC Saga
Atomicity True atomicity — all or nothing Eventual consistency — briefly inconsistent mid-saga
Locks Local locks in each DB, held across all phases No locks — each service commits locally and moves on
Services Independent codebases, but you must control all DBs Independent codebases, no DB control required
Works across organisations No — you can't hold locks in external systems Yes — compensating transactions work across any API
Coordinator failure Participants freeze holding locks — blocking protocol No coordinator (choreography) or fault-tolerant orchestrator
Failure handling Automatic rollback coordinated by coordinator Compensating transactions run in reverse
Consistency during failure Never inconsistent — all or nothing Briefly inconsistent while compensating transactions run
Latency High — two network round trips + locks held throughout Low — async, each service moves on immediately
Throughput Low — local locks create contention at high traffic High — no locks, no waiting across services
Complexity Simpler protocol, fragile under coordinator failure Each service needs idempotency + compensating transactions
Observability Coordinator knows the full state Choreography: hard. Orchestration: easy.

The decision rule#

Use 2PC when: - You need true atomicity — not eventual consistency - You own and control all participating services and their databases - The system has low throughput (not thousands of transactions per second) - You're using a distributed SQL database that already supports it (Google Spanner, CockroachDB) - Inconsistency even for a millisecond is unacceptable (financial ledger, stock trade)

Use Saga when: - Any participant is an external organisation or third-party API - You need high availability and can tolerate brief inconsistency - The system is microservice-based with separate databases per service - Throughput is high — local locks across services would be a bottleneck - The business can handle compensation (refund, cancel, restock) instead of true rollback - Most e-commerce, ride-hailing, food delivery, booking, payment systems


Choreography vs Orchestration#

Choreography Orchestration
Control Decentralised — services react to events Centralised — orchestrator drives each step
Debugging Hard — flow spread across services and Kafka topics Easy — full saga state in one place
Coupling Loose — services don't know about each other Tighter — services coupled to orchestrator
Single point of failure None Orchestrator (mitigated by fault-tolerance + DB persistence)
Best for Simple flows, small number of steps Complex flows, many steps, where observability matters

Interview framing

"I'd use Saga over 2PC here — 2PC holds local locks in each service's database across two network round trips, which kills throughput at scale. More importantly, if any participant is an external system like a bank or third-party API, 2PC simply can't work — you can't hold locks inside systems you don't control. Saga gives us eventual consistency with compensating transactions, works across any service boundary, and never blocks. For the implementation I'd use orchestration — the full saga state in one place makes debugging and monitoring straightforward."