Bulkhead
The Real World Analogy#
Named after ship design — ships have bulkheads (walls) separating compartments. If one compartment floods, the walls contain the water. The ship keeps floating.
Same principle in software — isolate failures so one component can't sink everything else.
The Problem Without Bulkheads#
Your app has 100 threads — shared across all services
Normal operation:
Payment → 20 threads in use
Recommendations → 15 threads in use
Notifications → 10 threads in use
Available → 55 threads free
Payment goes slow (30s per request):
10 users hit Payment → 10 threads stuck waiting
20 users hit Payment → 20 threads stuck
50 users hit Payment → 50 threads stuck
100 users hit Payment → ALL 100 threads stuck
User requests Recommendations → no threads available → fails
User requests Notifications → no threads available → fails
Payment brought down the entire system
Cascading Failure
One slow service starved all thread resources. Recommendations and Notifications never had a problem — they were killed by association.
The Fix — Bulkhead#
Assign each service its own isolated resource pool:
Payment → 20 dedicated threads
Recommendations → 20 dedicated threads
Notifications → 20 dedicated threads
General pool → 40 threads
Payment goes slow:
Its 20 threads fill up
New Payment requests → fail fast (no thread available)
Recommendations → 20 threads untouched → fully operational
Notifications → 20 threads untouched → fully operational
Failure contained
One compartment floods. The ship keeps sailing.
Bulkhead Beyond Thread Pools#
The same pattern applies to any shared resource:
| Resource | Without Bulkhead | With Bulkhead |
|---|---|---|
| Thread pools | One service starves all threads | Each service has dedicated threads |
| Connection pools | One service exhausts DB connections | Each service has connection limit |
| Memory | One service causes OOM, kills process | Memory limits per service/container |
| CPU | One service pegs CPU, starves others | CPU limits per container (Docker/K8s) |
In Kubernetes
Resource limits (requests and limits in pod spec) are bulkheads at the infrastructure level — each pod gets guaranteed CPU and memory, preventing one pod from starving others.
Bulkhead + Graceful Degradation Together#
Payment thread pool exhausted (bulkhead triggered)
↓
New Payment requests fail fast
↓
Graceful degradation kicks in
↓
Show "Payment temporarily unavailable, try again in a moment"
↓
Recommendations and Notifications: completely unaffected
Bulkhead contains the failure. Graceful degradation handles it for the user.
Interview framing
"I'd use bulkheads to isolate thread pools per downstream service — if Payment goes slow, it exhausts its own pool and fails fast rather than starving Recommendations and Notifications. Combined with graceful degradation, users see a payment error while the rest of the app works normally."