Skip to content

Raw IDs

The starting point

The simplest possible approach — use whatever ID the database assigns to the row and hand it directly to the user as the short code. No generation logic, no extra storage, no complexity.


The approach#

When a user submits a long URL, the app server inserts a row into the database. The database assigns an auto-incrementing integer ID — 1, 2, 3, 4... The app server takes that ID, appends it to the domain, and returns it.

User submits → https://very-long-url.com/with/path
DB inserts row → assigned ID = 4821903
Short URL returned → bit.ly/4821903

Dead simple. No hashing, no encoding, no collision checks. The ID is the short code.


Why it seems fine at first#

  • Zero extra logic — the DB handles ID generation
  • No storage overhead — the short code is just the ID you already have
  • Guaranteed unique — DB primary keys are always unique

Problem 1 — You expose your internals#

The IDs are sequential. If a competitor sees bit.ly/4821903 today and bit.ly/4821940 tomorrow, they know you created 37 URLs in that time window. Your growth rate, your traffic patterns, your scale — all visible from the short codes themselves.

This is called enumeration — anyone can iterate through your entire URL space by just incrementing the number. Every URL ever created is now publicly discoverable.


Problem 2 — Auto-increment breaks under sharding#

From the estimation, we know this system needs to store 250TB over 10 years. That cannot fit on a single machine. The database will need to be sharded — data split across multiple machines.

Auto-increment IDs are generated by the database engine on a single machine. The moment you split across multiple DB nodes, each node starts its own counter:

Shard 1 → generates ID 1, 2, 3, 4...
Shard 2 → generates ID 1, 2, 3, 4...

bit.ly/1 now points to two different long URLs ✗

There is no coordination between shards. Uniqueness is broken.


Why this fails

Raw IDs expose internals via enumeration and break the moment you shard. Both problems are fundamental — not fixable by patching. We need a different approach entirely.