UUID — The 128-Bit Alternative#

What is UUID?#

UUID (Universally Unique Identifier) is a 128-bit ID that can be generated by any server, anywhere, with no coordination. No central service, no machine registration, no shared state. Just generate and use.

A typical UUID looks like this:

550e8400-e29b-41d4-a716-446655440000

That's 32 hex characters + 4 dashes = 36 characters when displayed as a string. But the underlying data is 128 bits = 16 bytes.

Does random guarantee uniqueness?#

The first instinct is to doubt it — if it's random, can't two servers generate the same UUID?

In theory, yes. In practice, no. The random space is:

2^128 = 340,282,366,920,938,463,463,374,607,431,768,211,456 values
      ≈ 340 undecillion

At 10M IDs/second, generating a duplicate UUID is so improbable that it's treated as impossible for all engineering purposes. This is why UUID is used safely in millions of production systems worldwide.

Uniqueness is probabilistic, not mathematical

UUID uniqueness relies on the random space being so large that collisions are practically impossible. For most systems this is fine. For systems where a collision would cause data corruption (payments, ledgers), a structurally guaranteed approach — where uniqueness is enforced by design, not probability — is safer.

Why UUID fails our FR — not time-sortable#

UUID is random. There is no timestamp component in the most widely used version (v4). Sorting a list of UUIDs gives you nothing meaningful — you get random ordering, not chronological ordering.

This violates our FR directly: IDs must be sortable by creation time.

If you use UUID as your primary key and want chronological ordering, you have to store a separate created_at timestamp column, add an index on it, and query by that. That's extra storage, extra index, extra complexity — problems we'll solve with a better approach later.

Storage cost#

UUID is 128 bits = 16 bytes. A space-efficient 64-bit ID costs exactly half. UUID costs double.

From our estimation: 400 trillion IDs over 10 years at 8 bytes = 3.2 EB. With UUID:

400 trillion × 16 bytes = 6.4 EB

Every table that uses UUID as a primary key pays double. Every index on that column pays double. Every foreign key reference pays double. At 400 trillion IDs, the difference is 3.2 EB of extra storage across all calling systems.

The page split problem#

Databases store indexes as B+ trees. Each node in the tree is a fixed-size page (typically 8–16 KB) that holds a range of IDs.

With sequential IDs:

Each new ID is larger than all previous ones. It always goes to the rightmost page. Pages fill up completely before a new page is needed.

Page 1: [1001, 1002, 1003, 1004]  ← full, never touched again
Page 2: [1005, 1006, 1007, 1008]  ← full, never touched again
Page 3: [1009, 1010, ___,  ___]   ← all new inserts land here

New ID 1011 → goes to Page 3. No disruption to existing pages.

With random IDs (UUID):

Each new ID can land anywhere in the tree. When the target page is already full, the database must split it.

Page 1: [1021, 5043, 7891, 9234]  ← full
Page 2: [2045, 4521, 6789, 8901]  ← full
Page 3: [3012, 5678, ___,  ___]   ← partially full

New ID 6000 arrives → must go inside Page 2 between 5043 and 6789. Page 2 is full → page split:

Page 2a: [2045, 4521, 5043, 6000]
Page 2b: [6789, 7891, 8901, ___]   ← now half empty, wasted space

Page 2b is now half-empty — fragmented. This happens on nearly every insert because every new UUID lands randomly in the existing tree.

At 10M inserts/second with UUID, you're constantly triggering page splits across the entire tree — more disk I/O, more fragmentation, slower writes that compound over time. Sequential IDs only ever append to the rightmost page — splits are rare and predictable.

UUID versions — the evolution#

UUID isn't one thing — it has multiple versions that evolved to fix earlier problems.

UUIDv1 — timestamp + MAC address Includes a timestamp and the machine's MAC (network card) address. Has a time component, but the timestamp bits are scrambled — not placed in the most significant bits — so UUIDs are still not sortable by time. Also exposes the machine's MAC address in every ID, which is a privacy problem.

UUIDv4 — purely random The most widely used version. 122 bits of randomness (6 bits reserved for version/variant metadata). No timestamp at all. Completely random. All the problems described above — not sortable, page splits, double storage.

UUIDv7 — timestamp first The newest version (2022). Places a millisecond-precision timestamp in the most significant bits, followed by random bits. This makes UUIDs time-sortable. Fixes the page split problem. Still 128 bits (double the storage of a 64-bit ID), but otherwise the most capable UUID version.

UUID versions at a glance#

Version	Sortable	Storage	Unique guarantee	Notes
UUIDv1	❌ scrambled timestamp	16 bytes	probabilistic	Exposes MAC address
UUIDv4	❌ random	16 bytes	probabilistic	Most widely used
UUIDv7	✅ timestamp first	16 bytes	probabilistic	Best UUID version, still 2x storage

The full comparison against other approaches is in 07-Comparison.md.