Interview Cheatsheet

One-Line Definition#

Info

A NoSQL database where entities are nodes and relationships are edges stored with direct disk pointers — making multi-hop relationship traversal O(1) per hop instead of O(n) per SQL join.

no#

Why Not SQL?#

SQL joins on relationships:
  1 hop  → 1 table scan   ✅ fast
  2 hops → 2 table scans  ✅ ok
  3 hops → 3 table scans  ⚠️ slow at scale
  5 hops → 5 table scans  ❌ minutes on billion-row tables

Each join scans the entire relationships table again. Cost = hops × table size.

Graph DB: each hop follows a disk pointer. Cost = O(1) per hop, regardless of database size.

Data Model#

Node  → entity       (User, Product, Account, City)
Edge  → relationship (FRIENDS_WITH, BOUGHT, BORN_IN, USES_PHONE)
Both can have properties {key: value}

Important

A "friend" is not a separate node type — it's a User node connected by a FRIENDS_WITH edge. Relationships live on edges, not in separate tables.

Cypher Cheatsheet#

-- 1 hop: direct friends
MATCH (u:User {id:1})-[:FRIENDS_WITH]->(friend)
RETURN friend

-- 2 hops: friends of friends
MATCH (u:User {id:1})-[:FRIENDS_WITH]->(f1)-[:FRIENDS_WITH]->(fof)
RETURN fof

-- Variable length: up to 3 hops
MATCH (u:User {id:1})-[:FRIENDS_WITH*1..3]->(other)
RETURN other

-- Shortest path
MATCH path = shortestPath((a:User {name:"Alice"})-[:FRIENDS_WITH*]-(b:User {name:"Bob"}))
RETURN path

Use Cases#

Use case	Why graph?
Social graph (LinkedIn, Twitter)	Friends-of-friends, shortest path between users
Fraud detection (PayPal, Uber)	Multi-hop shared identifier patterns across accounts
Recommendations (Amazon)	Users who bought X also bought Y — edge traversal
Knowledge graph (Google)	Entity relationships — Obama → born in → Honolulu

Deletion Rule#

Must delete edges before deleting a node

Neo4j refuses to delete a node with existing edges — prevents dangling pointers. Always delete relationships first, then the node.

Scale Reality Check#

Neo4j (off-the-shelf)     → fraud detection, social graphs, recommendations
                            millions to low billions of nodes, supports clustering

Google Knowledge Graph    → custom-built, billions of entities, planet scale
                            not achievable with any off-the-shelf tool

When NOT to Use#

Graph DB is bad at bulk scans

"Give me all users over 30" — use SQL. Graph DBs optimize for traversal, not attribute-based filtering across many nodes.

The signal: - Query starts at one node, follows edges → Graph DB - Query scans a collection by attribute → SQL