Skip to content

Auto-Scaling — Overview#

Running peak capacity 24/7 wastes money. Running minimum capacity causes outages. Auto-scaling is the answer.

Auto-scaling automatically adds servers when load increases and removes them when it drops. This folder covers how the feedback loop works, what metrics trigger scaling, how servers are safely removed without killing in-flight requests, and how to solve the cold start problem so new servers are ready before traffic arrives.


Files in this folder#

File Topic
01-Auto-Scaling.md What it is, feedback loop, reactive vs predictive, metrics, scale out vs in asymmetry
02-Connection-Draining.md How LB safely removes servers, in-flight requests, drain timeout tradeoff
03-Cold-Start.md Pre-baked AMIs, warm pools, predictive scaling as cold start solution
04-Interview-Cheatsheet.md How to bring auto-scaling into a design interview