Skip to content

Interview Cheatsheet — Latency, Throughput, Bandwidth & Percentiles#

The interviewer says "design WhatsApp". What's the first thing you do?

You don't draw boxes. You figure out what kind of system this is by assessing all four metrics first.


Why This Matters#

Without this step, everything you say is vague: - "The system should be fast" — fast by what measure? For whom? At what scale? - "We need high throughput" — how high? How do you know?

Percentiles are what make the other three metrics concrete and defensible. Without them you're throwing buzzwords. With them you have real numbers that justify every architecture decision you make.


The Three Questions to Ask First#

For every system the interviewer gives you, run through these three questions before drawing anything:


Question 1 — Is this latency sensitive?#

Ask yourself: is there a human waiting for a response in real time?

Answer What it means
Yes — user is waiting Latency is critical. Optimize hard for it.
No — background job, batch process Latency is not the concern. Focus elsewhere.

Examples: - WhatsApp message delivery → yes, someone is staring at the screen waiting → latency critical - YouTube video upload → no, you click upload and go make tea → latency not critical - Google Search → yes, user is waiting for results → latency critical - Nightly analytics report → no, runs at 2am → latency irrelevant


Question 2 — What's the throughput demand?#

Ask yourself: how many requests per second at peak?

Do a quick back of envelope:

Daily active users × actions per user per day = total daily requests
Total daily requests / 86,400 seconds = average RPS
Average RPS × 3 to 5 = peak RPS

That number tells you whether one server is enough or whether you need horizontal scaling, sharding, and load balancers.

Peak RPS What it implies
< 1,000 Single server might handle it
1,000 – 10,000 Multiple servers, load balancer needed
10,000 – 100,000 Caching, read replicas, horizontal scaling
100,000+ Sharding, distributed architecture

Example — WhatsApp:

2 billion users, each sends 10 messages/day
= 20 billion messages/day
= 20B / 86,400 ≈ 230,000 messages/second average
× 5 for peak = ~1,000,000 RPS
One server handles ~1,000 RPS. You need ~1,000 servers minimum. This immediately tells you the architecture must be distributed.


Question 3 — Is bandwidth a concern?#

Ask yourself: is this system moving large amounts of data per request?

Data per request Bandwidth concern?
Text, small JSON (< 1KB) No — throughput and latency dominate
Images, documents (100KB – 10MB) Yes — bandwidth starts to matter
Video, large files (10MB+) Critical — bandwidth is the primary bottleneck

The same system can have different bottlenecks for different features

WhatsApp text messages → latency and throughput problem, bandwidth is fine WhatsApp media (photos, videos) → now bandwidth is a serious concern Design each feature path separately.


Step 4 — Attach Percentiles to Make It Concrete#

Now take your three answers and put real numbers on them using percentiles.

Without percentiles → "low latency"
With percentiles → "P99 latency under 200ms"

That's the difference between a vague statement and a design target.

Metric How to express it with percentiles
Latency "P99 latency under Xms" — 99% of requests must complete within X
Throughput "System must handle Y RPS at P95" — 95% of the time traffic is under Y, design for peak
Bandwidth "P95 of file uploads complete within Z seconds for a NMB file"

Worked Examples#

WhatsApp#

Metric Assessment Target
Latency Human waiting for message — critical P99 < 200ms for message delivery
Throughput 1M RPS at peak P95 sustained, design for 1M RPS peak
Bandwidth Text only = tiny. Media = large Text: not a concern. Media: CDN + compression needed

YouTube#

Metric Assessment Target
Latency Upload — not critical. Playback start — critical P99 < 3s for video playback start
Throughput 500 hours of video uploaded per minute, billions of views P95 read throughput must handle 10M+ concurrent viewers
Bandwidth Video is massive — critical CDN mandatory, adaptive bitrate streaming (serve lower quality on slow connections)
Metric Assessment Target
Latency User waiting for results — extremely critical P99 < 500ms, P50 < 100ms
Throughput 100,000+ searches per second globally Distributed across thousands of servers
Bandwidth Text results — small. Images — moderate CDN for image results

Dropbox#

Metric Assessment Target
Latency Sync in background — not critical P95 < 5s for small file sync
Throughput Millions of sync events per day Moderate — not the main concern
Bandwidth Files can be large — critical Chunked uploads, delta sync, compression

The Full Mental Checklist#

Every time you get a system design question, run this before touching the diagram:

  • [ ] Is a human waiting for a response? → sets your latency target → pick P99 or P999
  • [ ] How many requests per second at peak? → sets your throughput target → drives scaling decisions
  • [ ] How large is the data per request? → sets your bandwidth concern → drives CDN, compression, chunking decisions
  • [ ] State all three as concrete numbers with percentiles before designing

What a strong hire says vs what a weak hire says

❌ Weak: "We need low latency and high throughput"

✅ Strong: "This is a real-time chat system. P99 message delivery must be under 200ms since users are actively waiting. At 500M DAU sending 20 messages per day we're looking at ~115K RPS average and ~500K RPS peak — that rules out a single server architecture immediately. Messages are text so bandwidth is not a concern, but media sharing will need a CDN."

Same knowledge. Completely different impression.