Skip to content

The Tree

The core idea

When a bucket hash differs, you don't compare all the rows inside it — you split it into sub-buckets and hash those. Keep splitting recursively until the differing section is small enough to sync directly. This tree of hashes is the Merkle Tree. You find the exact differing rows by exchanging a handful of hashes at each level.


Recursive splitting#

You know Bucket 3 is different. But it still has 250 million rows — too many to compare directly. So you split Bucket 3 into 4 sub-buckets and hash each one:

Bucket 3 splits into:
  Bucket 3.1: rows 500M – 562M  → hash e721
  Bucket 3.2: rows 562M – 625M  → hash 9a43
  Bucket 3.3: rows 625M – 687M  → hash c819
  Bucket 3.4: rows 687M – 750M  → hash 4f02

Compare again:

            Replica1   Replica2
Bucket 3.1:  e721   ==  e721  ✓ same
Bucket 3.2:  9a43   ≠   1b77  ✗ different
Bucket 3.3:  c819   ==  c819  ✓ same
Bucket 3.4:  4f02   ==  4f02  ✓ same

Narrowed from 250 million rows down to 62 million — by exchanging 4 more hashes.

Keep splitting the differing bucket at each level until the bucket is small enough to sync directly.


The full tree structure#

graph TD
    R[Root Hash]
    B1[Bucket 1 ✓]
    B2[Bucket 2 ✓]
    B3[Bucket 3 ✗]
    B4[Bucket 4 ✓]
    B31[3.1 ✓]
    B32[3.2 ✗]
    B33[3.3 ✓]
    B34[3.4 ✓]
    B321[3.2.1 ✓]
    B322[3.2.2 ✗]
    B323[3.2.3 ✓]
    B324[3.2.4 ✓]

    R --> B1
    R --> B2
    R --> B3
    R --> B4
    B3 --> B31
    B3 --> B32
    B3 --> B33
    B3 --> B34
    B32 --> B321
    B32 --> B322
    B32 --> B323
    B32 --> B324

Each node is a hash of its children. The root hash is a hash of all four top-level bucket hashes. If the root hash matches between two replicas — the entire dataset is identical. If it differs — you drill down level by level to find exactly where.


How efficient is this?#

Starting from 1 billion rows, splitting into 4 buckets at each level:

Level 1:  1,000,000,000 rows  → 4 hash comparisons
Level 2:    250,000,000 rows  → 4 hash comparisons
Level 3:     62,500,000 rows  → 4 hash comparisons
Level 4:     15,625,000 rows  → 4 hash comparisons
Level 5:      3,906,250 rows  → 4 hash comparisons
Level 6:        976,562 rows  → 4 hash comparisons
Level 7:        244,140 rows  → 4 hash comparisons
Level 8:         61,035 rows  → 4 hash comparisons
Level 9:         15,258 rows  → 4 hash comparisons
Level 10:         3,814 rows  → sync directly ✓

40 hash comparisons to locate the differing rows out of 1 billion. You only transfer actual row data for the final small bucket — everything else is tiny hash exchanges.

Compare that to the naive approach of sending all 1 billion rows over the network.