Why Single Node Doesn't Scale#

Your Redis instance is running fine. 10 million users later, what breaks?

Two fundamental problems#

Problem 1 — Memory limit:

10 million users × 10KB per user profile = 100GB of cache data

A single Redis node typically has 32-64GB of RAM
→ you physically cannot fit 100GB on one server
→ cache hit rate drops as keys get evicted to make space
→ more cache misses → more DB queries → DB becomes the bottleneck again

Problem 2 — SPOF (Single Point of Failure):

Redis node goes down (hardware failure, OOM kill, deployment)
→ entire cache gone instantly
→ every request → cache miss
→ all traffic hits DB simultaneously
→ DB collapses under the load ✗

A cache that improves performance 95% of the time but causes outages when it fails is worse than no cache at all — it trains your system to depend on it, then fails catastrophically.

The solution — distribute the cache#

Split the cache across multiple nodes. Each node holds a fraction of the total keyspace.

Single node (fails at 64GB):
  Node A → all 100GB of keys → memory exhausted ✗

Distributed (scales horizontally):
  Node A → keys 1-25M   → 25GB ✓
  Node B → keys 25M-50M → 25GB ✓
  Node C → keys 50M-75M → 25GB ✓
  Node D → keys 75M-100M→ 25GB ✓

New problem: a request comes in for user:123:profile. Which node do you go to?

This is the key routing problem — solved by consistent hashing (next file).