CP vs AP — When to Choose Each#
The Core Question#
Which is worse for your system — wrong data or no response?
Wrong data worse → CP (stop serving, stay consistent)
No response worse → AP (serve stale, stay available)
CP Systems — Consistency over Availability#
CP systems stop serving requests during a partition rather than risk returning stale or wrong data.
Behavior during partition:
Partition happens → node can't reach quorum
→ stops accepting reads and writes
→ returns error: "service temporarily unavailable"
→ waits for partition to heal
→ resumes when quorum restored
Why CP is the right choice:
When wrong data causes more damage than being unavailable.
Leader election (ZooKeeper):
Stale data → two nodes think they're leader → split-brain → data corruption
Being down → services can't get leader info → temporarily unavailable
→ Down is recoverable. Split-brain is catastrophic.
Payment processing:
Stale balance → user charges against non-existent funds → financial loss
Being down → user sees "try again" → calls support
→ Failed payment is recoverable. Wrong payment destroys trust.
Distributed locks (etcd):
Stale lock info → two processes enter critical section → race condition
Being down → processes wait → slightly delayed
→ Wait is fine. Race condition corrupts data.
Real-world CP systems:
| System | Use Case | Why CP |
|---|---|---|
| ZooKeeper | Leader election, distributed locks | Wrong leader info = split-brain |
| etcd | Kubernetes control plane | Wrong config = cluster corruption |
| HBase | Consistent reads at scale | Strong consistency required |
| Google Spanner | Global financial DB | Money = linearizability required |
AP Systems — Availability over Consistency#
AP systems serve potentially stale data during a partition rather than refusing requests.
Behavior during partition:
Partition happens → node isolated
→ keeps serving requests
→ returns data it currently has (may be stale)
→ syncs with other nodes when partition heals
→ eventually all nodes converge to same state
Why AP is the right choice:
When being unavailable causes more damage than slightly stale data.
Instagram feed:
Stale like count → user sees 1000 likes instead of 1002 → nobody notices
Being down → users leave → engagement drops → revenue lost
→ Slight staleness is fine. Downtime is expensive.
Shopping cart (Amazon):
Stale cart → item added by spouse not visible yet → minor annoyance
Being down → user can't shop → lost sale
→ Amazon famously chose AP for carts.
DNS:
Stale DNS record → user hits old server briefly → minor delay
DNS down → entire internet breaks
→ Availability is everything for DNS.
Cassandra (social data):
Stale post count, follower count → off by a few → acceptable
Down → 2 billion users can't use WhatsApp → unacceptable
Real-world AP systems:
| System | Use Case | Why AP |
|---|---|---|
| Cassandra | Social data, time-series | Availability > consistency |
| DynamoDB | Shopping carts, sessions | Availability critical |
| CouchDB | Offline-first apps | Must work without connectivity |
| DNS | Name resolution | Global availability non-negotiable |
The Same System, Different Choices#
The same database can behave as CP or AP depending on configuration
Cassandra tunable consistency:
Write/Read with ALL → CP behavior (all replicas must agree)
Write/Read with QUORUM → balanced
Write/Read with ONE → AP behavior (fastest, most available, potentially stale)
DynamoDB:
Eventually consistent reads → AP (default, lower latency)
Strongly consistent reads → CP (higher latency, guaranteed fresh)
You pick per-operation based on what that specific data requires.
The Decision Framework#
Step 1: What consistency does this data need?
Financial, locks, leader election → linearizability → CP
Social, analytics, preferences → eventual/causal → AP
Step 2: What happens if the system is unavailable?
Users lose money, critical infra breaks → tolerate unavailability (CP)
Users see stale feed, minor annoyance → tolerate staleness (AP)
Step 3: Which is worse — wrong data or no response?
Wrong data worse → CP
No response worse → AP
Step 4: State the choice and justify it
"This system needs CP because stale [X] would cause [Y]"
"This system needs AP because unavailability would cause [Y]"
System → CP/AP Quick Reference#
| System | Choice | Reason |
|---|---|---|
| Payment processing | CP | Wrong balance = financial loss |
| Bank transfer | CP | Money cannot be lost or duplicated |
| Leader election | CP | Stale = split-brain |
| Distributed locks | CP | Stale = race condition |
| Social feed | AP | Slight staleness acceptable |
| Shopping cart | AP | Availability > perfect consistency |
| Chat messages | AP | Availability for billions of users |
| DNS | AP | Global availability non-negotiable |
| Leaderboard | AP | Off by a few = fine |
| Hotel booking | CP | Double booking = serious problem |