Interview Cheatsheet — Scalability#

When the interviewer says "now scale it to 10x traffic" — what do you actually say?

You don't just say "add more servers." You walk through where it breaks, what you scale, and what constraints that creates.

The Mental Model — Bottleneck → Fix → Next Bottleneck#

Scalability is never one decision. It's a sequence:

10x traffic hits
  ↓
App servers CPU saturated → add servers (horizontal scaling)
  ↓
Database now the bottleneck → read replicas, caching, sharding
  ↓
Network bandwidth saturated → CDN for static content, compression
  ↓
New bottleneck revealed → repeat

Always present it as a chain, not a single fix.

The Four Questions to Ask First#

Before designing anything, ask these during requirements:

"What's the expected traffic?" — orders of magnitude matter (100 RPS vs 100,000 RPS require very different designs)
"What's the traffic pattern?" — steady, daily peaks, unpredictable spikes, scheduled events
"What's the read/write ratio?" — read-heavy systems scale differently than write-heavy ones
"Are there SLA requirements?" — latency targets constrain what you can do (sharding adds latency)

What to Actually Say — The Script#

Phase 1 — App tier#

"The first bottleneck under load is the app servers. I'd scale horizontally — add servers behind a load balancer. Round robin for stateless services, least connections if request durations vary significantly. App servers must be stateless — all session data in Redis, all persistent state in the database. Stateless servers are disposable and interchangeable."

Phase 2 — Database#

"Once the app tier scales out, the database becomes the bottleneck. For reads, I'd add read replicas and put a cache (Redis) in front of the database — most reads hit the cache, reducing DB load by 80–90%. For writes, if a single primary can't keep up, I'd shard — partition data horizontally across multiple databases by user ID or geographic region."

Phase 3 — Network / static content#

"For media, images, and static assets, I'd push them to a CDN. The CDN serves content from edge nodes close to users — reduces latency and removes that traffic from our origin servers entirely."

The Three Bottlenecks — One-Line Each#

Bottleneck	Symptom	Fix
CPU (app servers)	High CPU, slow response times	Horizontal scaling + load balancer
Database	Slow queries, connection pool exhaustion	Read replicas + caching + sharding
Network	High bandwidth costs, slow static loads	CDN for assets, compression

Vertical vs Horizontal — Know When to Use Each#

	Vertical (bigger server)	Horizontal (more servers)
Use when	Quick fix, stateful systems, early stage	Production scaling, stateless services
Limit	Hardware ceiling — can't go beyond the biggest machine	Practically unlimited
Cost	Expensive at the top end	Linear cost scaling
Risk	Single point of failure	Redundant by nature

"I'd use vertical scaling as a quick fix early on or for stateful systems like databases. Horizontal scaling is the answer for the app tier — it's effectively unlimited and provides redundancy for free."

Statelessness — The Prerequisite#

Horizontal scaling only works cleanly with stateless servers. Always say this:

"Before horizontal scaling works, servers must be stateless — no session data in memory. Sessions in Redis, state in the database. Then any server can handle any request, servers are interchangeable, and auto-scaling works cleanly."

Auto-Scaling — What to Mention#

"I'd configure auto-scaling on the app tier — CPU above 70% for one minute triggers scale-out. Pre-baked AMIs get boot time under 90 seconds. For known peaks like daily traffic spikes, predictive scaling pre-scales 15 minutes early so cold start is off the critical path. Scale-in is conservative — CPU below 30% for 15 minutes — with connection draining to prevent in-flight request errors."

The Full Scalability Checklist#

[ ] Asked about traffic volume and pattern before designing
[ ] Identified which tier is the first bottleneck (app vs DB vs network)
[ ] Horizontal scaling on app tier — servers are stateless
[ ] Load balancer in front of app servers — named the algorithm
[ ] Sessions → Redis, state → database
[ ] Cache (Redis) in front of database for reads
[ ] Read replicas for read-heavy workloads
[ ] Sharding only if write volume exceeds single primary
[ ] CDN for static assets and media
[ ] Auto-scaling with appropriate metrics, AMIs, warm pools
[ ] Presented as a bottleneck chain, not a single answer

Quick Reference#

Bottleneck chain:
  App servers → DB → Network

App tier fix:
  Stateless servers + load balancer + auto-scaling

DB fix (reads):
  Redis cache → read replicas → sharding (last resort)

DB fix (writes):
  Vertical first → then sharding by user ID / region

Network fix:
  CDN for static assets + media

Auto-scaling trigger:
  CPU > 70% for 1 min → scale out
  CPU < 30% for 15 min → drain + scale in

Statelessness rule:
  Sessions → Redis
  State    → Database
  Servers  → disposable, interchangeable