Latency#
How long does one request take from start to finish?
That duration is latency.
The Restaurant Analogy#
You sit down and order pasta. The clock starts.
The waiter walks to the kitchen, the chef cooks it, the waiter walks back, the plate lands on your table. The clock stops.
That total time = latency.
It doesn't matter how many other tables are being served. Latency is only about your one request.
Where does the time actually go?#
A request doesn't teleport. Time is spent at every step:
| Step | What's happening | How slow? |
|---|---|---|
| Network | Data physically travelling over wires between client and server | 0.5ms same datacenter, 150ms US→Europe |
| Queuing | Your request sitting in a waiting line before a thread picks it up | Depends on traffic |
| Compute | The server doing actual work — business logic, calculations | Usually fast, microseconds |
| I/O | Reading from a database or disk | 150µs SSD, 10ms HDD — the killer |
I/O is almost always the dominant cost
This is the one that surprises beginners. The server doing calculations is fast. Waiting for data from disk is what kills latency.
Why is a database call so slow?#
Your server has two places it can get data from:
RAM — memory that lives inside the server itself. Accessing it is nearly instant because it's physically right there on the same chip. Like grabbing something off your desk.
Disk — where the DB permanently stores its data. This is a completely separate physical device. Every time your server needs data from the DB, it has to: 1. Send a request to the disk — "give me this data" 2. Wait for the disk to physically locate it 3. Transfer it back
That round trip is slow. And the reason databases live on disk — not RAM — is that disk survives a server restart. RAM gets wiped the moment the power goes off. You can't store your users' data there permanently.
The numbers every engineer must memorize
| Where data lives | Time to access | Comparison to RAM |
|---|---|---|
| RAM | ~100 nanoseconds | baseline |
| SSD (disk) | ~150 microseconds | 1,500x slower |
| HDD (spinning disk) | ~10 milliseconds | 100,000x slower |
| Network (same datacenter) | ~0.5 milliseconds | 5,000x slower |
| Network (US → Europe) | ~150 milliseconds | 1,500,000x slower |
These numbers are not trivia. They are the reason every architecture decision exists.
- SSD is 1,500x slower than RAM → this is why caching exists
- Cross-region network is 1.5M x slower than RAM → this is why CDNs exist
This single table explains more about system design than almost anything else. Every time someone proposes a solution, trace it back to one of these numbers.
High latency vs Low latency#
Not every system needs to be fast. But you need to know your target.
- Low latency systems — Google Search (~200ms), stock trading (~1ms), games (<50ms)
- High latency is acceptable — batch report generation (minutes), video transcoding (hours)
In an interview — never say "the system should be fast"
Always give a specific number. Example: "latency should be under 100ms"