Skip to content

Cassandra Internals — Overview#

Cassandra is a distributed column-family store built for massive write throughput and predictable low-latency reads — as long as you know your partition key. Every internal design decision flows from that constraint.


Files in this folder#

File Topic
01-Ring-Architecture.md How Cassandra distributes data across nodes using consistent hashing
02-Write-Path.md What happens inside a node when a write arrives — CommitLog, MemTable, SSTable, compaction
03-Read-Path.md How Cassandra reads efficiently — Bloom Filters, SSTable merge, coordinator routing
04-Replication-Consistency.md Replication factor, consistency levels, and the R+W>N formula
05-Interview-Cheatsheet.md Quick-reference for revision and interviews

The mental model#

Client Write
     ├──→ Coordinator Node (routes to correct node via ring)
     │         │
     │         ├──→ CommitLog (disk, append-only, durability)
     │         └──→ MemTable (memory, sorted, fast)
     │                   │ (when full)
     │                   ↓
     │               SSTable (disk, sorted, immutable)
     │                   │ (over time)
     │                   ↓
     │               Compaction (merge SSTables, keep latest version)
     └──→ Replicated to N nodes (RF=3 means 3 copies)

Client Read
     ├──→ Coordinator routes to replica nodes
     ├──→ Bloom Filter per SSTable → skip files that don't have the key
     └──→ Read MemTable + remaining SSTables → merge → return latest