Skip to content

Sharding

Sharding is the right solution to the right problem. Before committing to it, verify whether the problem actually exists.


Why sharding was on the table#

In the estimation phase, we calculated storage requirements assuming all paste data — content and metadata — lives in Postgres:

Total pastes (10 years):  3.65B
Per paste:                ~10KB (content) + ~100 bytes (metadata) ≈ 10KB
Raw storage:              3.65B × 10KB = 36.5TB
With replication (3×):    ~110TB
With index overhead:      ~150TB

A single Postgres machine handles ~10TB practically. At 150TB, sharding was listed as a hard requirement in the NFR.

The NFR file stated: sharding required — 150TB exceeds single-machine limit.


Rejecting sharding on traffic grounds first#

Before even touching storage, check whether read or write QPS demands sharding.

Peak write QPS:  30 writes/sec
Peak read QPS:   3,000 reads/sec

Single Postgres primary:   10,000–50,000 reads/sec capacity
Single Postgres primary:   1,000–5,000 writes/sec capacity

QPS thresholds from back-of-envelope reference:

Read QPS > 10k   → caching required
Read QPS > 100k  → DB sharding required

Write QPS > 1k   → write batching or async queue
Write QPS > 10k  → shard primaries

Our peak read QPS is 3,000 — well under the 10k threshold where even caching becomes necessary (and we're adding caching anyway for latency, not throughput). Our peak write QPS is 30 — orders of magnitude below the 1k threshold.

Traffic alone does not require sharding. A single Postgres primary handles our load with significant headroom.


Rejecting sharding on storage grounds — the S3 correction#

The original 150TB estimate was wrong because it assumed paste content lives in Postgres. But in our DB design, we offload all content to S3 and store only a pointer in Postgres.

What actually lives in Postgres per paste:

short_code:    ~8 bytes
user_id:       8 bytes  (bigint)
content_hash:  32 bytes (SHA-256)
s3_url:        ~100 bytes
created_at:    8 bytes
expires_at:    8 bytes
deleted_at:    8 bytes (nullable)
ref_count:     4 bytes

Total per row: ~180 bytes → round to ~200 bytes

Recalculating Postgres storage:

Total pastes (10 years):   3.65B rows
Per row (metadata only):   ~200 bytes
Raw storage:               3.65B × 200 bytes = 730GB ≈ 0.73TB
With replication (3×):     0.73 × 3 = ~2.2TB
With index overhead (1.3×): ~2.9TB

→ ~3TB in Postgres

A single Postgres machine handles 10TB comfortably. 3TB is well within single-machine territory.

S3 stores the content:

3.65B pastes × 10KB = 36.5TB of content in S3

S3 is object storage designed for petabytes. 36.5TB is routine. No sharding, no special configuration — S3 handles this natively.


The verdict#

                   Original estimate   Corrected estimate
Postgres storage:  150TB               ~3TB
Sharding needed?   Yes                 No

Why the difference:
  Original assumed content (10KB) in Postgres
  Corrected moves content to S3, Postgres stores only metadata (~200 bytes)
  Storage drops by 50×

The decision to offload content to S3 didn't just reduce latency and improve DB performance — it eliminated the sharding requirement entirely. Architecture decisions compound: one good call (S3 for blobs) removed a significant operational burden (shard key selection, consistent hashing, cross-shard queries).

Sharding remains the right answer if Postgres storage approaches 10TB. At ~3TB with 10 years of data, we have headroom and no need to introduce the operational complexity of a sharded setup.


Interview framing

"We initially flagged sharding as required — 150TB exceeds the single-machine limit. But that assumed content lives in Postgres. Once we moved content to S3, Postgres only stores metadata at ~200 bytes per row. 3.65B rows × 200 bytes = ~3TB with replication — well under the 10TB single-machine limit. Traffic doesn't require sharding either: 3k peak reads/sec and 30 peak writes/sec are handled by a single primary with headroom. Sharding is off the table."