Cache Stampede (Thundering Herd)#
A hot key's TTL expires. Thousands of concurrent requests all get a cache miss simultaneously. All of them query the DB at once.
How it happens#
Homepage query cached with TTL = 60s
→ 50,000 requests/second hitting this key
→ T=60s: TTL expires
→ all 50,000 requests for that second get a cache miss
→ all 50,000 query the DB simultaneously
→ DB gets 50,000 queries instead of its usual 1
→ DB collapses under the load ✗
The problem isn't the cache miss itself. It's the thundering herd — thousands of identical DB queries happening in parallel because no one coordinated who should actually fetch the data.
Fix 1 — Refresh-Ahead#
Don't let the key expire in the first place. Detect that expiry is approaching and refresh proactively.
TTL = 60s, refresh threshold = 45s
T=45s → background job detects key has 15s remaining and is hot
→ fetches fresh data from DB → updates cache
T=60s → key would expire, but it's already fresh ✓
→ users never see a miss → DB never sees the spike ✓
Best for predictable hot keys. Requires knowing which keys are worth monitoring.
Fix 2 — Mutex with Double-Checked Locking#
When a key expires, only let one request fetch from DB. All others wait, then check the cache again.
T=60s: key expires → 50,000 requests get cache miss
→ all try to acquire a lock for this key
→ one request wins the lock
→ 49,999 wait
Winner:
→ fetches from DB
→ writes to cache
→ releases lock
Waiters (after lock released):
→ check cache again ← this is the double-check
→ cache hit ✓ → return immediately
→ DB gets exactly 1 query total ✓
The double-check is critical:
def get(key):
value = cache.get(key)
if value: return value # first check (no lock needed)
lock.acquire(key)
try:
value = cache.get(key) # second check (inside lock)
if value: return value # someone else already fetched it
value = db.fetch(key)
cache.set(key, value, ttl=60)
return value
finally:
lock.release(key)
Without the second check inside the lock, waiters would fetch from DB too — serialised instead of eliminated. With it, DB gets exactly one query regardless of how many concurrent waiters.
Fix 3 — Probabilistic Early Expiry#
Each request near TTL expiry has a random chance of triggering a proactive refresh. No lock needed, no coordination.
TTL = 60s, key has 10s remaining
Each request flips a weighted coin:
20% chance → this request refreshes the cache proactively
80% chance → serve current value, do nothing
As TTL approaches zero, the probability increases
→ cache gets refreshed naturally before it expires
→ no thundering herd, no locks, no background jobs
Elegant and simple. The randomness ensures the refresh is distributed across requests rather than hitting all at once.
Which fix to use#
Refresh-Ahead → predictable hot keys, background job available
Mutex + double-check → unpredictable spikes, need coordination
Probabilistic expiry → simple, works without any extra infrastructure
In practice, most production systems combine: Refresh-Ahead for known hot keys + mutex for everything else.