Bandwidth vs Latency vs Throughput#

These three keep coming up together — what exactly is each one measuring?

They sound related but they are three completely different rulers measuring three different things.

The Three Rulers#

Metric	What it measures	Unit
Latency	Time for one round trip	milliseconds (ms)
Throughput	How many requests per second	RPS / QPS
Bandwidth	How many bits of data flow per second	Mbps / Gbps

The Critical Distinctions#

Latency doesn't care about data size#

Whether your request carries 1 byte or 1 MB — latency only measures the round trip time. A ping to a server is essentially 0 bytes of data but it still has latency. Data size is irrelevant.

Throughput counts requests, not data#

A system processing 10,000 requests per second has high throughput — even if each request is just a tiny 10 byte ping. Throughput is about how many, not how much data.

Bandwidth counts data, not requests#

A system serving one user downloading a 4K movie is transferring massive amounts of data per second — high bandwidth usage. But throughput is just 1 request. Bandwidth is about how much data, not how many requests.

Where the confusion comes from#

Throughput and bandwidth both feel like "capacity" metrics — and they are, just for different things.

Scenario	Throughput	Bandwidth
Millions of tiny API pings	Very high	Very low
One user downloading a 4K movie	Very low	Very high
Millions of users streaming HD video	Very high	Very high
One user sending a chat message	Very low	Very low

You can have high throughput with low bandwidth and vice versa

They are independent. Always check both separately.

A single analogy that covers all three#

Imagine a highway full of trucks delivering packages:

Latency — how long does one truck take to travel from warehouse to destination and back?
Throughput — how many trucks complete their delivery per hour?
Bandwidth — how many kilograms of packages are delivered per hour in total?

A truck's travel time (latency) doesn't change based on how many other trucks are on the road. More trucks per hour = higher throughput. Bigger packages per truck = higher bandwidth usage.

How to apply this in every case study#

For every system you design, ask all three questions:

1. Latency — is response time for a single request acceptable? - Chat message → must be under 100ms, user is waiting - Batch report → minutes are fine, nobody is waiting

2. Throughput — can the system handle the volume of requests? - Social media feed → millions of users refreshing simultaneously - Internal admin dashboard → a few hundred users, no problem

3. Bandwidth — is the data volume manageable? - Video streaming → massive, need CDN and compression - Text-based chat → tiny, bandwidth is not the concern

The three questions to ask for every case study

Before designing anything, identify which of these is the bottleneck:

Latency — is response time for a single request too slow? (e.g. chat message delay)
Throughput — can the system handle the volume of requests? (e.g. 100K users hitting the API simultaneously)
Bandwidth — is the volume of data being transferred too large for the pipe? (e.g. serving 4K video to millions)

Each has completely different solutions. A bandwidth problem is not solved by adding more servers. A throughput problem is not solved by moving servers closer to users. Identify the bottleneck first, then design.

Each bottleneck has a different solution

Latency problem → move data closer to users (cache, CDN, edge servers)
Throughput problem → add more servers, more threads, horizontal scaling
Bandwidth problem → compress data, upgrade network pipes, use CDN for large files

Solving the wrong bottleneck wastes time and money. Identify which one is failing first.