Skip to content

Interview Cheatsheet — Auto-Scaling#

When does auto-scaling come up in an interview and what do you actually say?

It comes up at two moments — when you discuss handling variable load, and when the interviewer asks "what happens when traffic spikes?"


Moment 1 — Requirements Phase#

When discussing non-functional requirements, ask:

"What does the traffic pattern look like? Is it steady, does it have daily peaks, or are there unpredictable spikes?"

The answer shapes your scaling strategy:

Traffic Pattern Strategy
Steady, predictable Fixed capacity or light auto-scaling
Daily peaks (9am, weekends) Predictive scaling — pre-scale before known peaks
Unpredictable spikes (viral content, flash sales) Reactive auto-scaling + warm pool for fast response
Scheduled events (product launch, show release) Manual pre-scaling + predictive scaling

Moment 2 — "What Happens When Traffic Spikes?"#

Walk through the full picture:

"I'd configure auto-scaling on the app server tier — CPU above 70% for one minute triggers scale-out, adds servers from a warm pool for near-instant capacity. New servers boot from pre-baked AMIs — everything pre-installed, ready in 90 seconds. For known traffic patterns like daily peaks, I'd add predictive scaling rules that pre-scale 15 minutes before the expected spike so cold start is off the critical path entirely."

Then address scale-in:

"For scale-in — when CPU drops below 30% for 15 consecutive minutes — I'd drain connections first. The load balancer stops sending new requests to terminating servers, existing in-flight requests complete, then the server is removed. No user-facing errors from the scale-in."


The Statelessness Point — Say It Out Loud#

Auto-scaling only works if servers are stateless. Always mention this:

"For auto-scaling to work correctly, app servers must be stateless — no session data in memory. Sessions live in Redis, all persistent state in the database. Servers are interchangeable and disposable — any server can handle any request."

This shows you understand the architectural constraint, not just the scaling mechanism.


The Full Auto-Scaling Checklist#

  • [ ] Asked about traffic patterns before designing
  • [ ] Specified which tier scales (app servers — not databases, not load balancers)
  • [ ] Named the metric triggering scale-out (CPU, queue depth, custom)
  • [ ] Mentioned asymmetry — aggressive scale-out, conservative scale-in
  • [ ] Addressed cold start — AMIs, warm pools, or predictive scaling
  • [ ] Mentioned connection draining for scale-in
  • [ ] Confirmed servers are stateless

Quick Reference#

Scale out trigger:  CPU > 70% for 1 min  → add servers immediately
Scale in trigger:   CPU < 30% for 15 min → drain + remove slowly

Cold start fixes:
  Pre-baked AMI   → 7 min boot → 90 seconds
  Warm pool       → 90 seconds → 5 seconds
  Predictive      → cold start happens before spike, not during

Connection draining:
  1. Stop new requests to terminating server
  2. Let in-flight requests complete
  3. Terminate when 0 active connections (or timeout)

Stateless requirement:
  Sessions    → Redis
  State       → Database
  Servers     → disposable, interchangeable