The Problem
The Problem — Concurrent Writes to the Same Key#
Two clients write to the same key at the exact same time:
Client 1: put("user:123", "Alice") → goes to Node B, Node C, Node D
Client 2: put("user:123", "Bob") → goes to Node B, Node C, Node D
Both writes get W=2 acks. Both succeed. But the order each node receives them is not guaranteed — network latency is unpredictable. Client 1's request might reach Node B first but reach Node C second.
Node B receives: Alice first, then Bob → stores "Bob"
Node C receives: Bob first, then Alice → stores "Alice"
Node D receives: Alice first, then Bob → stores "Bob"
Final state:
Node B: "Bob"
Node C: "Alice"
Node D: "Bob"
The replicas disagree. A quorum read (R=2) that hits Node B and Node C gets two different values. We need a deterministic rule that every node agrees on, regardless of network timing — so that no matter which nodes a read hits, the same value wins.
Last Writer Wins (LWW) — The Simple Solution#
Every write carries a timestamp. When two values exist for the same key, the one with the higher timestamp wins. We've been using this throughout the system already — read repair compares timestamps, anti-entropy compares timestamps, compaction keeps the higher timestamp.
Client 1: put("user:123", "Alice", timestamp=1001)
Client 2: put("user:123", "Bob", timestamp=1002)
Node B: has both → keeps "Bob" (1002 > 1001)
Node C: has both → keeps "Bob" (1002 > 1001)
Node D: has both → keeps "Bob" (1002 > 1001)
Every node independently picks the same winner. No coordination needed. Deterministic. This is what Cassandra and DynamoDB use as their default conflict resolution strategy.
The Clock Drift Problem#
LWW relies on timestamps, but who assigns the timestamp? Two options:
- Client assigns it — each client stamps its own write. But clients run on different machines with different clocks.
- Coordinator assigns it — the coordinator node stamps the write when it receives it. But two different coordinators also have different clocks.
Either way, clocks across machines drift. A machine's clock might be milliseconds or even seconds ahead of another. This means LWW doesn't guarantee "the actually-last write wins" — it guarantees "the write with the higher timestamp wins." There's a difference:
Real time: Client 1 writes at 3:00:00.000
Client 2 writes at 3:00:00.001 ← actually later
Client 1's clock: 3:00:00.500 ← clock is 500ms ahead
Client 2's clock: 3:00:00.001 ← clock is accurate
LWW picks Client 1 (timestamp 500 > 001)
→ Client 1 wins even though Client 2 was actually later
→ Client 2's write is silently dropped
The user behind Client 2 thinks their update succeeded — they got W=2 acks. But the value was quietly overwritten by Client 1's stale write because Client 1's clock was ahead. The write is silently lost.
Is Silent Data Loss Acceptable?#
For most KV store use cases, LWW is acceptable. If two clients are writing to the same key at the exact same millisecond, there's no meaningful "correct" ordering — it's a race condition in the application logic. Either value is equally valid. LWW just picks one deterministically.
But there are use cases where losing a concurrent write is not acceptable. Consider a shopping cart on Amazon:
Phone: put("cart:user123", ["shoes"]) timestamp=1001
Laptop: put("cart:user123", ["shirt"]) timestamp=1002
LWW picks ["shirt"] (1002 > 1001)
→ "shoes" is silently dropped
→ User expects both items in cart, but only "shirt" is there
The user didn't intend to overwrite shoes with shirt — they intended to add shirt to the cart. Both writes are valid and should be merged, not overwritten. LWW can't do this — it always picks one winner and discards the other.
This is where the question becomes: should our KV store solve this, or is it the application's problem?