Last Updated: January 12, 2026
Every database type we have explored so far treats relationships as secondary.
Relational databases store relationships in foreign keys that must be joined at query time. Document databases embed related data or store references that require multiple queries. Wide-column databases denormalize relationships into the data model itself.
Graph databases flip this paradigm. Relationships become first-class citizens, stored and indexed just like the data itself.
Consider LinkedIn's "People You May Know" feature. To find potential connections, you need to traverse the social network: find friends of friends who are not already your friends, filter by shared interests or employers, and rank by connection strength.
In a relational database, each "hop" in the graph requires a JOIN operation. For a query that traverses 3-4 hops across millions of users, you are looking at joining billions of row pairs. Performance degrades exponentially with depth.
In a graph database, these traversals are what the system is optimized for. Relationships are stored as direct pointers between nodes. Traversing from one node to its neighbors is a pointer lookup, not a table scan.
The query "friends of friends of Alice" executes in milliseconds, regardless of how many total users exist in the system.
The property graph model is the most common way to represent data in graph databases. It consists of three elements: nodes, relationships, and properties.
Nodes represent entities in your domain. They are similar to rows in a relational table or documents in a document database. Each node:
Labels like "Person," "Company," and "Skill" categorize nodes. A node can have multiple labels, allowing flexible categorization.
Relationships connect nodes. Unlike foreign keys in relational databases, relationships are stored as explicit connections between nodes. Each relationship:
Note that relationships have their own properties. The FRIENDS_WITH relationship has a "since" property recording when the friendship started. The KNOWS relationship has a "level" property indicating proficiency.
Both nodes and relationships can have properties, which are key-value pairs storing attributes:
| Element | Property Key | Property Value |
|---|---|---|
| Person node | name | "Alice" |
| Person node | age | 28 |
| WORKS_AT relationship | since | 2022 |
| WORKS_AT relationship | role | "Engineer" |
Properties can be strings, numbers, booleans, dates, or arrays of these types.
The property graph model closely matches how we naturally think about many domains:
When your domain is naturally a graph, modeling it as a graph is intuitive. You draw nodes and edges on a whiteboard, and that becomes your database schema.
Graph databases use specialized storage and indexing structures to make relationship traversal fast.
The key architectural principle is index-free adjacency: each node directly stores pointers to its adjacent nodes. No index lookup is required to find neighbors.
Compare this to relational databases:
| Operation | Relational (with index) | Graph |
|---|---|---|
| Find Alice's friends | Scan index, retrieve rows, join | Follow pointers from Alice |
| Complexity | O(log n) per hop | O(1) per hop |
| 3-hop traversal | O(log n)³ minimum | O(neighbors at each hop) |
The difference becomes dramatic for deep traversals. In a relational database, each hop requires at least an index lookup. In a graph database, each hop is a pointer dereference.
Relationships are stored as first-class objects with their own storage:
Each relationship record contains:
This linked-list structure allows traversing all relationships of a node without indexes.
Graph database performance is measured differently than relational databases:
| Metric | Meaning |
|---|---|
| Traversals per second | How many hops per second |
| Latency per hop | Time to move from node to neighbor |
| Total query time | Depends on graph structure, not data size |
The crucial insight: query time depends on the portion of the graph touched, not the total size. Finding friends of friends for a user with 100 friends touches ~10,000 nodes regardless of whether the total graph has 1 million or 1 billion nodes.
Graph databases use specialized query languages designed for expressing graph patterns and traversals.
Cypher is a declarative pattern-matching language. You describe the pattern you want to find, and the database figures out how to find it.
Basic pattern matching:
The ASCII-art syntax is intentional: (node)-[:RELATIONSHIP]->(node) visually represents the graph pattern.
Creating data:
Aggregations:
Variable-length paths:
Gremlin is an imperative traversal language. You describe step-by-step how to traverse the graph.
Gremlin reads like a pipeline: start with vertices (V), filter by property, traverse out, deduplicate, extract values.
SPARQL is used for RDF (Resource Description Framework) graphs, common in semantic web and knowledge graphs.
| Aspect | Cypher | Gremlin | SPARQL |
|---|---|---|---|
| Style | Declarative, pattern-based | Imperative, step-based | Declarative, triple-based |
| Learning curve | Moderate | Steep | Moderate |
| Graph model | Property graph | Property graph | RDF triples |
| Primary database | Neo4j | JanusGraph, Neptune | GraphDB, Virtuoso |
| Readability | High (ASCII art patterns) | Moderate | Moderate |
Graph databases excel in domains where relationships are the primary focus of queries.
The classic graph use case. Users are nodes, friendships are relationships.
Typical queries:
Find items similar to what a user has liked, or items that similar users have liked.
Fraud often involves networks of connected accounts sharing devices, addresses, or payment methods.
Model entities and their relationships for semantic search and question answering.
Model permissions as a graph to answer "Can user X access resource Y?"
The most popular graph database, with a mature ecosystem and the Cypher query language.
AWS's managed graph database service supporting both property graphs and RDF.
Open-source, distributed graph database designed for large-scale graphs.
Multi-model database supporting graphs, documents, and key-value in one system.
| Feature | Neo4j | Neptune | JanusGraph | ArangoDB |
|---|---|---|---|---|
| Query language | Cypher | Gremlin/SPARQL | Gremlin | AQL |
| ACID | Full | Full | Limited | Full |
| Managed option | Aura | Yes (AWS) | No | ArangoDB Cloud |
| Horizontal scale | Enterprise | Yes | Yes | Yes |
| Multi-model | No | No | No | Yes |
Query performance depends heavily on traversal depth and the number of relationships per node:
| Depth | Nodes Touched (avg 100 friends) | Relational (worst case) | Graph |
|---|---|---|---|
| 1 | 100 | 100 rows | 100 hops |
| 2 | 10,000 | 10,000+ rows, joins | 10,000 hops |
| 3 | 1,000,000 | Millions of rows | 1M hops |
| 4 | 100,000,000 | Usually impractical | 100M hops |
Graph databases handle depth 2-3 traversals easily. Depth 4+ requires careful consideration of filtering and limits.
Supernodes are nodes with an unusually high number of relationships (e.g., a celebrity with millions of followers). They can cause performance problems:
Mitigation strategies:
While relationships are traversed without indexes, finding starting nodes requires indexes:
Always index properties used in initial node lookups (the starting point of traversals).
Scaling graph databases is challenging because graphs are inherently connected. Partitioning a graph means cutting some relationships, which hurts performance for queries that cross partition boundaries.
The simplest approach: use a bigger server. Neo4j can handle graphs with billions of nodes on a single large machine.
| Metric | Single Large Server |
|---|---|
| Nodes | Billions |
| Relationships | Tens of billions |
| RAM | Up to several TB |
| Storage | Up to PB with SSDs |
Scale read throughput by replicating the graph:
Partitioning a graph is hard because:
Approaches:
When sharding is acceptable:
Graph databases are the right choice when:
Graph databases may not fit when:
Graph databases treat relationships as first-class citizens:
| Aspect | Graph Database Approach |
|---|---|
| Data model | Nodes, relationships, and properties |
| Storage | Index-free adjacency, relationships stored as pointers |
| Queries | Pattern matching and traversal (Cypher, Gremlin) |
| Strength | Multi-hop relationship queries |
| Scaling | Challenging due to graph connectivity |
The next chapter explores time-series databases, which are optimized for a completely different access pattern: data that is indexed by time and queried primarily with time-range filters and aggregations.