Cross-Shard Joins

Before sharding, JOINs were free — all your tables were on one server. After sharding, related data might be on different servers. The database can't JOIN across servers. Now what?

Why JOINs break across shards#

Before sharding — users and tweets on the same server, JOIN is straightforward:

Single Server:
┌─────────────────────────────────┐
│  Users table                    │
│  user_id │ username             │
│  1       │ alice                │
│  2       │ bob                  │
│                                 │
│  Tweets table                   │
│  tweet_id │ user_id │ text      │
│  101      │ 1       │ "hello"   │
│  102      │ 2       │ "world"   │
└─────────────────────────────────┘

SELECT users.username, tweets.text
FROM tweets JOIN users ON tweets.user_id = users.user_id
→ both tables on same server → easy ✓

After sharding by user_id:

Shard 1 (user_id 1-500M):          Shard 2 (user_id 500M-1B):
┌──────────────────────┐           ┌──────────────────────┐
│  alice (user_id 1)   │           │  charlie (600M)       │
│  her tweets          │           │  his tweets           │
└──────────────────────┘           └──────────────────────┘

Alice retweets charlie. Alice is on Shard 1, charlie is on Shard 2:

Tweet on Shard 1:
  tweet_id │ user_id │ retweet_of_user_id
  103      │ 1       │ 600M   ← charlie is on Shard 2!

SELECT users.username, tweets.text
FROM tweets JOIN users ON tweets.user_id = users.user_id
→ alice's tweets on Shard 1, charlie's user data on Shard 2
→ database engine cannot JOIN across two different servers ✗

The database engine can only JOIN tables on the same server. It has no concept of reaching out to another server mid-query.

Fix 1 — Application-level JOIN#

Your app server queries each shard separately and joins the results in memory:

App server:
  Step 1 → query Shard 1 → fetch alice's tweets (including retweet_of_user_id 600M)
  Step 2 → query Shard 2 → fetch charlie's username and profile
  Step 3 → join results in memory on app server

Database JOIN    → optimised, uses indexes, runs inside one server — fast
Application JOIN → two network round trips + join in application memory → more latency ✗

Works but is slower and more complex. The app must now understand shard topology, which bleeds infrastructure concerns into your business logic.

Fix 2 — Co-location (better)#

Design your shard key and data model so related data always lands on the same shard. You eliminate the cross-shard join at the design stage rather than patching it at query time.

For Twitter, shard by user_id and store the user's profile, tweets, and follows all on the same shard:

Shard 1:
  alice → her profile
        → all her tweets
        → all her follows/followers

Shard 2:
  charlie → his profile
           → all his tweets
           → all his follows/followers

Alice's feed query only needs her own data — it stays on Shard 1 entirely. Cross-shard only happens when viewing charlie's profile from alice's session — handled in the app layer for that specific case.

Co-location rule:
  Store data that is queried together → on the same shard
  The primary entity (user) and all its owned entities (tweets, follows)
  → same shard key → same shard

Co-location is a design decision, not an afterthought

You make this choice when you design the data model, before you write any queries. Sharding an existing relational schema that wasn't designed for co-location is painful — cross-shard joins will be everywhere and fixing them requires restructuring the schema.

The general rule#

Avoid cross-shard joins by:
  1. Co-locating related data under the same shard key (preferred)
  2. Denormalising — store a copy of the data you'd JOIN on the same shard
  3. Application-level join for unavoidable cases (two network hops, more latency)

Denormalisation in this context means intentionally duplicating data to avoid the join:

Instead of:
  tweets table: tweet_id | user_id | text
  → JOIN users table to get username

Denormalise:
  tweets table: tweet_id | user_id | username | text
  → username stored directly on the tweet row
  → no JOIN needed, username always available on the same shard
  → downside: if username changes, must update all tweet rows