Measuring Latency#
What to measure#
Latency = time from when the node receives the request to when it returns the ID. This is pure in-memory work — no disk, no network to another service. Should be sub-millisecond in normal operation.
Track as a histogram, not an average. Averages hide tail latency.
Per-node histograms#
Run separate histograms per node. If one node's p99 is 20ms while others are 0.5ms, that node has a problem — bad hardware clock, NTP issues, or resource contention. A single aggregated histogram would hide this.
Clock skew contribution to latency#
When the clock moves backwards, the node waits — this wait time shows up as latency on those specific requests. A sudden spike in p99 latency on one node, combined with clock skew wait events, points directly to NTP correction as the cause.
Track clock skew wait duration as a separate metric so you can correlate with latency spikes.
At our scale#
At 1M IDs/second sustained across ~3 nodes, each node handles ~333K requests/second. The histogram collects 333K data points per second per node. Use reservoir sampling or HDR histograms to keep memory usage bounded while maintaining accuracy at the tail.