Cold Start Problem#
Auto-scaling triggered. A new server is booting. Traffic is already spiking. The server won't be ready for 7 minutes. What do you do?
Three solutions — each attacking the problem at a different point.
What Cold Start Is#
When a new server boots from scratch, it's not immediately useful. It has to go through a full startup sequence:
1. Start blank VM 30 seconds
2. OS boot 20 seconds
3. Install runtime (Java, Node, Python) 60 seconds
4. Install application dependencies 90 seconds
5. Pull latest application code from Git 30 seconds
6. Run startup checks 20 seconds
7. Start application process 30 seconds
8. JVM warmup / cache population 60 seconds
9. Pass health checks (3 consecutive) 30 seconds
──────────────────────────────────────────────────────
Total: ~6-7 minutes before serving traffic
During those 6-7 minutes, the new server is useless. Existing servers remain overloaded. Users experience slowness.
For a reactive spike — traffic doubles in 30 seconds — 7 minutes is an eternity.
Solution 1 — Pre-baked AMIs#
What an AMI is: AMI (Amazon Machine Image) is a complete frozen snapshot of a server's disk — OS, runtime, dependencies, application code — all captured at a specific point in time. Booting from an AMI means starting with everything already installed.
Normal boot vs AMI boot:
Normal boot: AMI boot:
────────────────────────────────────── ──────────────────────────────
1. Start blank VM 30s 1. Start VM from AMI snapshot 30s
2. Install Java 60s (Java already installed)
3. Install dependencies 90s (Dependencies already there)
4. Pull code from Git 30s (Code already baked in)
5. Start application 30s 2. Start application 15s
6. JVM warmup 60s 3. JVM warmup 30s
7. Pass health checks 30s 4. Pass health checks 15s
────────────────────────────────────── ──────────────────────────────
Total: ~7 minutes Total: ~90 seconds
Everything that was installed and configured is frozen into the image. You skip the entire setup phase.
How AMIs fit into a deployment pipeline:
Engineer merges new code
↓
CI/CD pipeline runs tests
↓
Pipeline bakes new AMI (installs new code into the image)
↓
Auto-scaling group updated: "use new AMI for all future servers"
↓
Next time scaling triggers → new server boots from new AMI → 90 seconds to ready
Old servers running old code stay running until they're drained and replaced. Zero downtime deployment.
Solution 2 — Warm Pools#
Even 90 seconds is too slow for a sudden spike. Warm pools eliminate the remaining wait entirely.
The concept: Keep a small set of servers already booted and idle — not registered with the load balancer, not serving traffic, just sitting there having already completed startup. When scaling triggers, pull from the pool instead of booting fresh.
Normal state:
Active servers: 10 (serving traffic, registered with LB)
Warm pool: 3 (booted, idle, NOT registered with LB)
Traffic spike hits — scaling triggers:
Option A (without warm pool): boot new server → 90 seconds
Option B (with warm pool): pull from warm pool → register with LB → 5 seconds
After pulling from warm pool:
Active servers: 11 (new server immediately serving traffic)
Warm pool: 2 (one slot freed — trigger to replenish in background)
Background (off critical path):
Boot new server → add to warm pool
Active servers: 11
Warm pool: 3 (replenished, ready for next spike)
The cold start still happens — it just happens in the background, not during a spike.
Cost tradeoff:
Warm pool servers are idle but not free. You pay for them doing nothing.
The question is: what does 90 seconds of overloaded servers cost you in lost revenue and user experience? For most production systems — $216/year is trivial compared to the damage of a slow spike response.
Warm pool sizing:
Too small → spike exhausts pool, back to cold booting Too large → paying for idle capacity you rarely use
Rule of thumb: size the warm pool to cover your typical sudden spike. If traffic usually jumps 20% in a burst, keep enough warm servers to absorb that 20%.
Solution 3 — Predictive Scaling#
The cleanest solution: don't let cold start happen during a spike at all. Start servers before the spike arrives.
How it works: Auto-scaling analyses historical traffic patterns and pre-scales based on predictions:
Historical data: traffic 5x every weekday at 9:00am
Predictive rule:
8:45am → add 40 servers (cold start happens now — 7 minutes off peak)
8:52am → all 40 servers warmed up, registered with LB, health checks passing
9:00am → traffic arrives → 50 servers already waiting, no spike felt
6:30pm → traffic drops → scale back in with connection draining
Cold start still takes 7 minutes. But it happens at 8:45am when load is low — not at 9:00am when users are hammering the system.
Netflix — show release scaling:
New season drops: Friday 3:00am
Netflix's schedule:
Thursday 8:00pm → predictive scaling adds servers across all regions
Thursday 8:00pm → 11:00pm → cold starts complete, JVMs warmed up, caches populated
Friday 3:00am → show drops, millions start streaming simultaneously
Infrastructure already at full capacity — zero spike felt
By the time users arrive, servers aren't just booted — they're warm. JVM has been running for hours, hit-ratio on caches is high, everything optimised.
The Three Solutions Compared#
| Solution | Reduces boot time to | Cost | Best for |
|---|---|---|---|
| Pre-baked AMI | ~90 seconds | Low — just CI/CD pipeline change | All systems — should always do this |
| Warm pool | ~5 seconds | Medium — pay for idle servers | Systems with unpredictable sudden spikes |
| Predictive scaling | 0 seconds during spike | Low — servers run only when needed | Systems with predictable traffic patterns |
In practice — use all three together:
Pre-baked AMI makes every boot faster (baseline improvement). Warm pool handles sudden spikes. Predictive scaling handles known patterns like daily peaks or scheduled events. They're complementary, not alternatives.
In an interview — cold start is a follow-up question after auto-scaling
"For cold start, I'd use pre-baked AMIs to get boot time under 90 seconds, a warm pool sized for typical spike magnitude for instant response, and predictive scaling for known traffic patterns like daily peaks or planned launches."