Interview Cheatsheet

The mental model#

MongoDB = flexible JSON documents + indexes on any field (including arrays/nested)
        + replica sets + write concern + mongos sharding

Embedding vs Referencing#

	Embed	Reference
When	Bounded, always fetched together	Unbounded, fetched independently
Example	Product specs, images, sizes	Comments, likes, orders
Reads	One fetch, fast	Multiple round trips
Updates	Rewrite entire document	Update target document only
Limit	16MB document cap	No limit

Rule: bounded + co-fetched → embed
      unbounded + independent → reference

Write Concern#

Level	Guarantee	Cost	Use for
w:0	None — fire and forget	Fastest	Metrics, logs
w:1	Primary confirmed	Fast	Non-critical writes
w:majority	Quorum confirmed	Slower	Orders, payments, critical data

Indexes#

Regular index    →  flat field, same as SQL B-tree
Multikey index   →  array field, one index entry per array element
Nested index     →  dot notation "experience.years", reaches inside objects

Replication and Sharding#

Replica set  →  1 primary + N secondaries, async replication, auto failover
mongos       →  transparent query router, app sees one endpoint
Shard key    →  consistent hashing, each shard is a replica set

Limitations#

No cross-document joins     →  denormalize or multiple round trips
No schema constraints       →  application enforces integrity
16MB document limit         →  unbounded arrays must be referenced
Denormalization cost        →  update propagation is your problem

Use cases#

✓  Product catalogs (variable specs per category)
✓  User profiles (variable fields per user type)
✓  CMS / blog content (flexible content blocks)
✓  Event data with variable payload structure

✗  Financial transactions (needs strict constraints)
✗  Relational data with complex joins
✗  Write-heavy time-series at extreme scale (use Cassandra)

Interview framing#

"I'd use MongoDB when the data has variable structure and access patterns are document-centric — product catalogs, user profiles, CMS content. I'd embed bounded co-fetched data like specs and images, and reference unbounded data like comments. Write concern set to w:majority for critical writes, w:1 for high-throughput non-critical events. The key limitation is no cross-document joins — you design around that with intentional denormalization upfront."