Performance Metrics — SDE-3 Interview Questions#
These questions test deep understanding of failure modes, tail latency math, and architecture-level thinking. SDE-3 candidates are expected to reason about edge cases and make principled tradeoff decisions.
Your P99 latency is fine at 180ms but your P99.9 is 4 seconds. What could cause this and would you fix it?
Answer
Common causes of P99 fine but P99.9 blown:
| Cause | How it creates extreme outliers |
|---|---|
| GC pauses | Java stop-the-world GC can pause for 2-4 seconds. Rare but affects every in-flight request during the pause |
| Cold start | First request to a new server hits JVM warmup, empty cache, unwarmed connection pool |
| DB lock contention | Two transactions fighting over the same row — one waits. Usually milliseconds but can be seconds under high load |
| Retry storms | Downstream timeout → retry → retry also times out → request waits 3× timeout duration |
| TCP retransmission | Rare packet loss → TCP retransmits → adds 1-3 seconds |
How you catch it:
1. Distributed tracing (Jaeger / Zipkin)
→ traces every request end-to-end
→ find the exact 1-in-1000 slow requests
→ see which hop was slow
2. P99.9 dashboards
→ must explicitly track it — won't show in P99 or average
→ alert when P99.9 > threshold
Would you fix it?
Depends on the system — this is a business decision, not just a technical one.
| System | Fix it? | Reason |
|---|---|---|
| Payment | Yes — immediately | 4 seconds on a payment is unacceptable even for 1 in 1000 users |
| Social feed | Probably not | Engineering cost exceeds user experience benefit |
| Medical records | Yes | Trust and reliability are non-negotiable |
| Background job | No | No user waiting on it |
You have three services called in parallel. Their P99 latencies are 100ms, 150ms, and 200ms. What is the P99 of the combined response?
Answer
When services run in parallel, the combined response waits for the slowest one.
Service A: P99 = 100ms ──────────┐
Service B: P99 = 150ms ───────────────┐
Service C: P99 = 200ms ────────────────────┐
↓
Must wait for ALL three
Step 1 — Naive answer: 200ms (dominated by slowest) The combined response can't be faster than the slowest service.
Step 2 — The actual math (SDE-3 level):
The combined request is slow if any one of the three is slow:
P(A fast) = 0.99
P(B fast) = 0.99
P(C fast) = 0.99
P(all three fast) = 0.99 × 0.99 × 0.99 = 0.9703
P(at least one slow) = 1 - 0.9703 = 0.0297 ≈ 3%
3% of combined requests are slow → combined system is only at P97 level.
What does P97 mean in time?
- P97 of the combined system ≈ 200ms (C's P99 — when C's slow tail hits)
- Combined P99 = C's P99.7 or higher — could be 300-400ms+
- The combined P99 is always worse than the worst individual service's P99
The general rule:
More parallel calls = more coins being flipped
More coins = higher chance at least one lands on slow tail
Combined P99 degrades with every parallel dependency added
What to do about it:
1. Set aggressive timeouts + fallbacks
→ if C doesn't respond in 210ms → return partial result
→ don't make user wait for full tail
2. Track end-to-end P99 separately from per-service P99
→ per-service metrics will look healthy while end-to-end degrades
3. Hedge requests (advanced)
→ send duplicate request to a second instance after 190ms
→ use whichever responds first
→ trades bandwidth for latency reduction
Interview framing
"With three parallel calls the combined response waits for the slowest. But it's worse than just 200ms — the probability of hitting any slow tail compounds across all three services. At P99 level, each service has a 1% slow chance — combined that's ~3% slow, meaning combined P99 is around C's P99.7 which could be significantly higher than 200ms. I'd set timeouts with fallbacks and track end-to-end P99 separately."
A senior engineer says "we should optimize for P50 latency, not P99 — most users are fine." How do you respond?
Answer
Challenge the premise first.
P50 means 50% of requests are fast. That also means 50% are slow. That is not "most users are fine."
The full argument:
1. Challenge the framing: "P50 optimization means half your users are experiencing slow responses. At 10M requests/day that's 5M slow responses daily — that's not most users being fine."
2. The users hitting the slow tail are often your worst-off users:
- Users with the most complex data (heaviest queries)
- Users on slower networks
- Users on older devices Optimizing for P50 actively ignores the users who are already disadvantaged.
3. The danger of P50 optimization: Engineers can make P50 look great by fixing the fast common path while leaving the slow tail completely untouched. Metrics look great. Users suffer.
4. When P50 could be acceptable:
- Internal batch jobs (no human waiting)
- Background async processing
- Data pipeline jobs
- Any non-user-facing system
5. The floor for user-facing systems:
| Target | Use case |
|---|---|
| P99.9 | Payments, medical, financial transactions |
| P99 | User-facing APIs, search, feed loading |
| P95 | Nice-to-have features, non-critical paths |
| P90 | Absolute minimum for anything user-facing |
| P50 | Batch jobs, background processing only |
Interview framing
"I'd push back on P50 — it means half the users are slow. The right floor for any user-facing system is P95 minimum, P99 for critical paths. P50 optimization is only appropriate for background jobs where no user is waiting on the result."