Cassandra Internals — Overview#

Cassandra is a distributed column-family store built for massive write throughput and predictable low-latency reads — as long as you know your partition key. Every internal design decision flows from that constraint.

Files in this folder#

File	Topic
01-Ring-Architecture.md	How Cassandra distributes data across nodes using consistent hashing
02-Write-Path.md	What happens inside a node when a write arrives — CommitLog, MemTable, SSTable, compaction
03-Read-Path.md	How Cassandra reads efficiently — Bloom Filters, SSTable merge, coordinator routing
04-Replication-Consistency.md	Replication factor, consistency levels, and the R+W>N formula
05-Interview-Cheatsheet.md	Quick-reference for revision and interviews

The mental model#

Client Write
     │
     ├──→ Coordinator Node (routes to correct node via ring)
     │         │
     │         ├──→ CommitLog (disk, append-only, durability)
     │         └──→ MemTable (memory, sorted, fast)
     │                   │ (when full)
     │                   ↓
     │               SSTable (disk, sorted, immutable)
     │                   │ (over time)
     │                   ↓
     │               Compaction (merge SSTables, keep latest version)
     │
     └──→ Replicated to N nodes (RF=3 means 3 copies)

Client Read
     │
     ├──→ Coordinator routes to replica nodes
     ├──→ Bloom Filter per SSTable → skip files that don't have the key
     └──→ Read MemTable + remaining SSTables → merge → return latest