Data Model
DynamoDB is AWS's fully managed wide-column store. No servers to provision, no indexes to tune manually, no replication to configure. You define a partition key, optionally a sort key, and DynamoDB handles everything else — sharding, replication across 3 availability zones, scaling up and down automatically.
AWS markets DynamoDB as a "key-value store" because the simplest use case is: give it a key, get back an item. But that label is misleading. A partition key can return multiple items. A sort key lets you query ranges of items within a partition. You can get back one item, many items, or a sorted slice — that's wide-column behavior, not pure key-value.
True key-value (Redis): one key → exactly one value, always. DynamoDB: partition key → one partition containing many rows, sorted by sort key. Much richer.
DynamoDB has two keys: a partition key that routes your data to the right server, and an optional sort key that orders your data within that server. Everything else — joins, flexible queries, schema — is your problem to solve upfront in how you model your data.
The problem DynamoDB solves#
You're building Instagram. Users post photos, like posts, watch stories. At 500M users you're generating billions of writes per day — every like, every view, every scroll event.
A single SQL server with a B+ tree index handles reads fine — O(log n) lookup, fast enough. But billions of writes per day overwhelm a single machine. You need to shard. okay why DynamoDB is sharding built into the database itself. You don't manage it — you just define a partition key and DynamoDB handles the rest.
Partition key — which server#
When you write a row, DynamoDB runs the partition key through a hash function. The hash output determines which physical server (partition) stores that row.
user_id: 42 → hash(42) = 7823... → Server 3
user_id: 891 → hash(891) = 2341... → Server 1
user_id: 205 → hash(205) = 9102... → Server 7
When you read, DynamoDB hashes the partition key again, goes directly to that server. No broadcasting, no scanning all servers.
Query: give me user 42's data
→ hash(42) → Server 3
→ done. O(1) regardless of how many servers exist.
This is consistent hashing — the same mechanism used in distributed caches and manual sharding setups, except DynamoDB manages it automatically.
Sort key — order within a partition#
The sort key is optional. When present, all rows with the same partition key are stored together on the same server, physically sorted by the sort key.
Table: likes
Partition key: user_id
Sort key: created_at
Server 3 (user_id = 42):
→ like: post_1, created_at: 2024-01-01
→ like: post_5, created_at: 2024-01-03
→ like: post_9, created_at: 2024-01-07
All of user 42's likes live on one server, in chronological order. A single read fetches them all — or a range of them — without touching any other server.
How data is stored underneath — LSM Tree#
DynamoDB stores data within each partition using an LSM Tree (Log-Structured Merge Tree). Writes go to an in-memory buffer first, then flush to disk in sorted order — extremely fast writes, no random disk seeks.
See
06-Storage-and-Databases/04-Indexing/04-LSM-Tree.mdfor the full explanation.
This is why DynamoDB is write-heavy friendly — the same reason Cassandra is. The LSM tree underneath absorbs write bursts without the penalty of in-place B+ tree updates.
The full picture#
Write: user_id=42, post_id=9, created_at=2024-01-07
→ hash(42) → Server 3
→ stored in LSM tree, sorted by created_at within user 42's partition
Read: give me all likes by user 42 in January
→ hash(42) → Server 3
→ range scan on created_at within that partition
→ one server, one fast sequential read
The partition key decides WHERE. The sort key decides ORDER within that where. Everything is designed around these two keys — there is no query planner figuring things out at runtime.