Fundamentals
What Is Caching#
Don't recompute or re-fetch something you've already computed. Store the result somewhere fast and serve it from there next time.
Instagram feed = 20+ DB queries = ~200ms minimum
With caching:
First request → hits DB, builds feed, stores result in cache
Every request after → served from cache in ~1ms
→ User sees feed in 80ms
Same idea as Dynamic Programming memoization — store expensive results so you don't recompute them. The difference is scale: DP lives in one function, caching lives across requests, servers, and users.
The Cache Hierarchy#
Three layers — each faster but smaller and less shared than the next.
flowchart LR
A["🟢 Local In-Process<br/>(Guava, Caffeine)<br/>~nanoseconds<br/>no network hop<br/>per-server only"]
B["🟡 Distributed Cache<br/>(Redis, Memcached)<br/>~1ms<br/>one network hop<br/>shared across all servers"]
C["🔴 Database<br/>(disk)<br/>~10ms+<br/>network + disk I/O<br/>source of truth"]
A -->|slower| B
B -->|slower| C
style A fill:#d4edda,stroke:#28a745,color:#000
style B fill:#fff3cd,stroke:#ffc107,color:#000
style C fill:#f8d7da,stroke:#dc3545,color:#000 What To Cache#
Ask one question: "If this data is 500ms stale, does anything break?"
flowchart LR
subgraph DO["✅ Cache It"]
A["Expensive to compute<br/>feed ranking, search results"]
B["Read frequently<br/>user profiles, session tokens"]
C["Okay to be slightly stale<br/>like counts, follower counts"]
D["Static or rarely changes<br/>images, JS/CSS, config"]
end
subgraph DONT["❌ Don't Cache It"]
E["Real-time data<br/>stock prices, live inventory<br/><i>stale = wrong price / double booking</i>"]
F["Highly sensitive data<br/>passwords, payment details<br/><i>cache is less secured than DB</i>"]
G["One-time use data<br/>OTPs, single-use tokens<br/><i>not worth the memory</i>"]
end
style DO fill:#d4edda,stroke:#28a745,color:#000
style DONT fill:#f8d7da,stroke:#dc3545,color:#000
style A fill:#c3e6cb,stroke:#28a745,color:#000
style B fill:#c3e6cb,stroke:#28a745,color:#000
style C fill:#c3e6cb,stroke:#28a745,color:#000
style D fill:#c3e6cb,stroke:#28a745,color:#000
style E fill:#f5c6cb,stroke:#dc3545,color:#000
style F fill:#f5c6cb,stroke:#dc3545,color:#000
style G fill:#f5c6cb,stroke:#dc3545,color:#000 Caching is primarily a read optimization — same data served many times from cache instead of DB. Writes can be cached too but come with trade-offs.
Local vs Distributed Cache#
Local In-Process Cache#
Each app server caches data in its own memory.
Request 1 → Server 1 → cache miss → fetches DB → stores locally
Request 2 → Server 1 → cache hit ✓
Request 3 → Server 47 → cache miss → fetches DB again ✗
The problem:
User updates bio
→ Server 1 cache updated
→ Servers 2–100 still serve stale bio
→ inconsistent across servers
Use for: static config, feature flags, rarely changing data that's safe to be per-server.
Distributed Cache (Redis / Memcached)#
One shared cache, all servers read and write to it.
Request 1 → any server → cache miss → fetches DB → stores in Redis
Request 2 → any server → cache hit ✓ (served from Redis)
Request 3 → any server → cache hit ✓
User updates bio → invalidate one key in Redis → all servers see fresh data immediately
Use for: shared user data, sessions, feed results, anything that must be consistent across servers.
Two-Level Caching (L1 + L2)#
Best of both worlds — used by Instagram, Twitter, and most large-scale systems.
Request comes in
→ Check local cache (L1) — nanoseconds
→ hit → return immediately
→ miss → check Redis (L2) — ~1ms
→ hit → store in L1, return
→ miss → hit DB → store in Redis (L2) + local (L1) → return
L1 (local) → nanoseconds, per-server, inconsistency risk
L2 (Redis) → ~1ms, shared, consistent
DB → ~10ms+, source of truth
The Trade-off Summary#
Local cache → fastest, but inconsistent across servers
Distributed cache → consistent, ~1ms overhead, shared
Two-level → fast + consistent, more complexity
CDN → for static assets, geographically distributed
Hit ratio matters — if your cache hit rate is below 90%, you're paying the overhead of checking the cache on every request without enough benefit. Target >90% hit ratio before caching is worth the complexity.