Scalability#

Your system works fine with 100 users. What happens when 1 million show up?

That's scalability — the ability to handle growing load without breaking or requiring a complete redesign.

The two ways to scale#

When your server can't handle the load anymore, you have exactly two options:

Vertical Scaling — make the machine bigger

Upgrade your existing server. More CPU, more RAM, faster disk. Same machine, more power.

Simple — no code changes, no architecture changes
Has a hard ceiling — the biggest server money can buy still has limits
Single point of failure — one big machine is still one machine
Expensive — high-end hardware costs disproportionately more

Horizontal Scaling — add more machines

Run many average servers in parallel with a load balancer distributing traffic between them.

No ceiling — add as many servers as you need
No single point of failure — one dies, the others keep serving
Cheaper — commodity hardware scales linearly with cost
Complex — now you have a distributed system with all its problems

The industry default is horizontal scaling

Vertical scaling buys you time. Horizontal scaling is the real answer.

What makes horizontal scaling hard — state#

If your servers are stateless — no memory between requests — horizontal scaling is trivial. Add more servers, put them behind a load balancer, done. Any server can handle any request.

The problem starts when servers are stateful — when they remember things. If user A's session is stored on Server 1 and the next request goes to Server 2 — Server 2 has no idea who they are.

The core challenge of scalability: move state out of your servers and into dedicated systems (databases, caches, message queues) so the servers themselves stay stateless and horizontally scalable.

The three bottlenecks you'll hit#

As traffic grows, bottlenecks appear in a predictable order:

1. CPU / App servers Too many requests, threads exhausted. Fix: add more app servers horizontally.

2. Database App servers scaled fine, but now they're all hammering one database. The DB becomes the bottleneck. Fix: read replicas, caching, sharding.

3. Network / Bandwidth Data volume grows so large that network throughput becomes the limit. Fix: CDNs, compression, smarter data transfer.

Each bottleneck requires a different solution. Scaling is not one decision — it's a sequence of decisions as each layer hits its limit.

Scaling is not just about traffic volume#

Load comes in different shapes:

Load type	Example	Scaling approach
More users, read-heavy	Social media, news feed	Read replicas, caching, CDN
Write-heavy	Logging, metrics, financial transactions	Sharding, message queues
Larger data	File hosting, video streaming	Object storage, CDN
Spiky traffic	Ticket sales, product launches	Auto-scaling, pre-warming

The scaling strategy depends on which type of load you're dealing with. A system handling 1M steady reads scales very differently from one handling 100k sudden writes.

The scalability moment in every interview#

After you draw the initial architecture, the interviewer will ask:

"Now what if this needs to handle 10x the traffic?"

Walk through it systematically: 1. Which component breaks first? 2. What do you do to it? 3. What breaks next?

Scalability is a sequence of bottleneck → fix → next bottleneck

Never say "I'd scale the whole system." Always identify the specific bottleneck first.