04

Concepts

Storage & Databases

How data is stored, indexed, replicated, and queried at scale. The layer every system design eventually has to justify.

Fundamentals
Why not files? How databases actually store data — row vs column orientation, heap files, and the storage engine basics that explain everything above.
Row StorageColumn StorageHeap FilesStorage Engine
Open topic
ACID
Atomicity, consistency, isolation, durability. Transaction isolation levels, distributed transactions, 2PC, and Saga — everything about keeping data correct under concurrency.
Isolation Levels2PCSagaACID vs BASE
Open topic
SQL
Relational model, normalisation, denormalisation tradeoffs, joins, and views. The fundamentals before you decide SQL isn't enough.
NormalisationDenormalisationJoinsViews
Open topic
Indexing
Hash indexes, B+ Trees, LSM Trees, and geospatial indexing. The data structures that make reads fast and writes expensive — and when each one wins.
B+ TreeLSM TreeHash IndexGeospatial
Open topic
Replication
Sync vs async replication, replication lag, failover, and multi-primary. How you keep multiple copies of data consistent without destroying write throughput.
Sync vs AsyncReplication LagFailoverMulti-Primary
Open topic
Sharding
Shard keys, sharding strategies, consistent hashing, cross-shard joins, resharding, and over-sharding. How you split a database that's too big for one node.
Shard KeyConsistent HashingReshardingCross-Shard Joins
Open topic
MVCC
Multi-Version Concurrency Control. How databases serve reads without blocking writes by keeping multiple versions of the same row alive simultaneously.
MVCCSnapshot IsolationVersion ChainsGarbage Collection
Open topic
CDC
Change Data Capture and the outbox pattern. How you stream database changes to downstream systems without dual-write bugs.
CDCOutbox PatternDebeziumLog Tailing
Open topic
Pagination
Offset vs cursor pagination. Why offset pagination breaks at scale and how cursor-based pagination solves it without re-scanning the whole table.
Offset PaginationCursor PaginationKeyset PaginationDeep Pages
Open topic
Connection Pooling
The real cost of a database connection and how connection pools amortize it. What happens when the pool is exhausted and how to size it correctly.
Connection CostPool SizingPool ExhaustionPgBouncer
Open topic
Read / Write Splitting
Routing reads to replicas and writes to primary. The replication lag problem that follows, and when read/write splitting actually helps vs hurts.
Read ReplicasReplication LagStale ReadsRouting
Open topic
Database Types
Key-value, document, column-family, search engines, graph, blob storage, NewSQL, OLTP vs OLAP. Every database type, when it wins, and when it doesn't.
RedisCassandraMongoDBElasticsearch
Open topic
Choosing the Right DB
A decision framework and cheatsheet for picking the right database given your access patterns, consistency needs, and scale requirements.
Decision FrameworkAccess PatternsDB CheatsheetTradeoffs
Open topic
Data Modeling
Entities, relationships, access patterns, and red flags. How to model data for a real system — with a worked Instagram schema as the example.
ER ModelingAccess PatternsInstagram SchemaRed Flags
Open topic