Skip to content

Graph Data Model

What is a Graph Database?

A NoSQL database where data is stored as nodes (entities) and edges (relationships between entities). Optimized for traversing relationships — following connections from one entity to another across multiple hops.

The Three Building Blocks#

A graph database has three things: nodes, edges, and properties. That's it.

Nodes#

A node is an entity. In a social network, every user is a node. In a fraud detection system, every account, phone number, and device is a node. Nodes have a label (their type) and properties (their data).

(User {id: 1, name: "Alice", age: 28})
(User {id: 2, name: "Bob", age: 32})
(Phone {number: "+1-555-1234"})
(Device {id: "iPhone-XYZ-789"})

The parentheses () represent nodes — this is Cypher notation and it's deliberately shaped like a circle to match the graph diagram convention.

Edges#

An edge is a relationship between two nodes. Edges are directional (they point from one node to another) and have a type that describes the relationship.

(Alice) ──[FRIENDS_WITH]──▶ (Bob)
(Alice) ──[BOUGHT]──▶ (iPhone 15)
(Barack Obama) ──[BORN_IN]──▶ (Honolulu)

A friend is also a User node

There is no separate "Friend" node type. Both Alice and Bob are User nodes. The friendship is expressed by the FRIENDS_WITH edge between them. The relationship lives on the edge, not in a separate table.

Properties#

Both nodes and edges can carry properties — key-value pairs of data:

Node property:  (User {name: "Alice", joined: 2019})
Edge property:  (Alice)-[:FRIENDS_WITH {since: 2021}]->(Bob)

The edge property {since: 2021} tells you when they became friends — data that lives on the relationship itself. In SQL, this would require extra columns in the junction table. In a graph DB, it's a natural part of the edge.


Index-Free Adjacency — Why Traversal Is Fast#

This is the core architectural difference from SQL.

In SQL, finding Bob's friends means scanning the friendships table for rows where user_id = Bob's id. Even with an index, you're doing a B+Tree lookup every single hop.

In a graph database, each node stores direct disk pointers to its edges:

Node (Alice) stored at disk block 4821
  data: {name: "Alice", age: 28}
  edge pointer ──▶ Edge (FRIENDS_WITH) at block 7203
                        start: block 4821  (Alice)
                        end:   block 9034  (Bob)
                        ──▶ Node (Bob) at block 9034
                                data: {name: "Bob"}
                                edge pointer ──▶ ...

Following Alice's friendship to Bob is just: 1. Read Alice's node — get edge pointer (block 7203) 2. Read that edge — get Bob's node pointer (block 9034) 3. Read Bob's node — done

No table scan. No index lookup. Just reading disk blocks by their stored addresses — like following a linked list.

Why "index-free adjacency"?

In SQL, to find a node's neighbors you need to query an index ("find all rows where user_id = X"). In a graph DB, the node itself already knows where its neighbors are — the adjacency information is stored directly on the node. No index needed.

The cost of each hop is O(1) — constant, regardless of how many total nodes exist in the database. A 5-hop traversal on a graph with 1 billion nodes costs the same per hop as on a graph with 1000 nodes.

Compare to SQL: each join's cost grows with the size of the table being joined.


Deletion — The Edge Cleanup Requirement#

Because nodes store pointers to edges, and edges store pointers to nodes, deleting a node carelessly would leave dangling pointers.

If you delete Bob's node without removing the FRIENDS_WITH edge, Alice's edge pointer now points to a block that no longer contains a valid node.

Neo4j prevents this by refusing to delete a node that still has edges. You must delete all edges first, then the node:

-- Delete all of Bob's relationships first
MATCH (b:User {id: 2})-[r]-()
DELETE r

-- Then delete Bob
MATCH (b:User {id: 2})
DELETE b

This makes deletions more expensive than SQL (where DELETE FROM users WHERE id = 2 with CASCADE handles everything). The trade-off is acceptable because graph databases are read-heavy — the fast traversal is worth the slower deletes.