06

Concepts

Distributed Systems

What changes when your system runs on more than one machine. The problems, the algorithms, and the guarantees that make distributed computing tractable.

Why Distributed Systems Are Hard

The Two Generals Problem and network partitions. The fundamental uncertainty that makes distributed computing different from everything that came before.

Two GeneralsNetwork UncertaintyPartial FailureSplit Brain

Consistent Hashing

How to distribute data across nodes so that adding or removing a node only moves a fraction of keys. The algorithm behind Cassandra, DynamoDB, and every CDN.

Hash RingVirtual NodesMinimal DisruptionHotspots

Replication Strategies

Leader-follower, multi-leader, leaderless. The tradeoffs between write availability, consistency, and the complexity of conflict resolution.

Leader-FollowerMulti-LeaderLeaderlessQuorum

Making operations safe to retry. The pattern that turns at-least-once delivery into exactly-once behavior without a coordination service.

Idempotency KeysSafe RetriesDeduplicationAt-Least-Once

Delivery Guarantees

At-most-once, at-least-once, exactly-once at the distributed systems level — how these guarantees compose across network hops and node failures.

At-Most-OnceAt-Least-OnceExactly-OnceACK Semantics

Distributed Transactions

The problem of atomic commits across multiple nodes. 2PC, Saga choreography, Saga orchestration — and when each one is the right answer.

2PCSagaChoreographyOrchestration

Raft, Paxos, and ZooKeeper. How distributed nodes agree on a single value despite failures — the algorithm that underpins leader election and log replication.

RaftPaxosLeader ElectionLog Replication

Distributed Clocks

Clock drift, NTP, Lamport clocks, vector clocks, and TrueTime. Why wall clocks can't order events in a distributed system and what to use instead.

Clock DriftLamport ClocksVector ClocksTrueTime

Conflict-free Replicated Data Types. Data structures that merge concurrent updates automatically — no locks, no consensus, no conflicts.

G-CounterOperational TransformOT vs CRDTConvergence

Failure Detection

Heartbeats, gossip protocol, and the Phi Accrual Failure Detector. How nodes decide a peer is dead without being certain — and the cost of getting it wrong.

HeartbeatsGossip ProtocolPhi AccrualFalse Positives

Hash trees for detecting data inconsistency across replicas. How Cassandra and DynamoDB use Merkle Trees to run anti-entropy repair efficiently.

Hash TreeAnti-EntropyReplica SyncBucket Hashing

Coordination Services

etcd, leases, TTL, fencing tokens, lock vs job tracking. The primitives that let distributed services elect leaders and claim exclusive work safely.

etcdLeasesFencing TokensDistributed Lock