Cold Start#
The cache is completely empty. Fresh deployment, Redis restart, new region. Every request is a cache miss. The DB sees 100% of production traffic instead of its usual 5%.
How it happens#
Scenario 1: Fresh deployment
New Redis instance spun up → 0 keys
Traffic switches to new instance
→ every request → cache miss → hits DB
→ DB absorbs full production traffic
→ DB that normally handles 5% of reads now handles 100% → collapses
Scenario 2: Redis restart (OOM kill, upgrade)
All keys lost from memory
Same result
Scenario 3: New region launch
Cache in new region is empty
First users in that region → all cache misses
→ DB in that region (or cross-region DB) hammered
How cold start differs from stampede#
Same symptom (DB hammered), completely different cause:
Cache Stampede → one popular key expires → one burst of misses → recovers quickly
Cold Start → entire cache is empty → every key misses → sustained DB load
until cache gradually fills up (minutes)
Cold start is sustained. A stampede is a spike. Cold start can last minutes; a stampede resolves in seconds once one request repopulates the key.
Fix — Cache Warming#
Pre-populate the cache before opening traffic.
Before switching traffic to new cache instance:
Step 1 → analyse yesterday's access logs → identify top-N most-requested keys
Step 2 → fetch those values from DB
Step 3 → write them all into the new cache
Step 4 → open traffic
First real request → cache hit ✓ (key was pre-loaded)
DB → never sees the cold-start spike
How to identify what to warm:
Access log analysis → replay yesterday's requests, top keys by frequency
Traffic shadowing → mirror production reads to new instance before cutover
Popularity tracking → maintain a sorted set of most-accessed keys in Redis,
use this list to seed new instances
Netflix approach: pre-warm caches for new regions with the top trending content before the region goes live. The first user in São Paulo should hit a warm cache, not a cold DB in Virginia.
Interview framing
"On deploy I'd warm the cache before switching traffic — fetch the top-N keys from the DB based on yesterday's access patterns and load them into the new instance. This prevents the cold-start DB spike that would otherwise occur when every initial request misses."