A query that filters on an unindexed column forces the database to read every row in the table. On a table with 10 million rows, that means 10 million comparisons just to find a handful of matches.

Indexes solve this by giving the database a shortcut to find the rows it needs without scanning the entire table.

Sample Data

Insert these rows to follow along:

SQL

The streams table has 15 rows across multiple users, tracks, and countries. That is enough to illustrate how indexes work, though the real benefit of indexes becomes clear at thousands or millions of rows.

What Is an Index?

An index is a separate data structure that the database maintains alongside a table. It stores a sorted copy of one or more column values, paired with pointers back to the full rows in the table. Think of it like the index at the back of a textbook: instead of reading every page to find where "B-Tree" is discussed, you look up "B-Tree" in the index and jump straight to the right page.

Without an index, the database has one option: read every row in the table and check whether it matches the filter. With an index on the filtered column, the database can locate the matching rows directly.

The following diagram shows the difference:

The sequential scan touches every row regardless of how many match. The index lookup goes directly to the matching entries. On a table with millions of rows, that difference is the difference between a query taking seconds and taking milliseconds.

How Databases Find Rows Without an Index

When you run a query like this:

SQL

The database performs a sequential scan (also called a full table scan). It starts at the first row, checks if country = 'US', moves to the second row, checks again, and repeats until it has processed every row in the table.

For the 10-row users table above, this is perfectly fine. The database reads 10 rows and returns 3 matches. But consider what happens when the table grows. If users had 5 million rows and only 200 matched the filter, the database would still read all 5 million rows to find those 200. The work scales with the total number of rows, not the number of matches.

This is where indexes change the equation. By creating an index on the country column, the database can skip straight to the rows where country = 'US' without reading the rest of the table. The work scales with the number of matching rows instead.

B-Tree Indexes

B-Tree (balanced tree) is the default index type in PostgreSQL, MySQL, SQL Server, and other popular relational databases.

When you create an index without specifying a type, you get a B-Tree.

SQL

How a B-Tree Is Structured

A B-Tree organizes values into a hierarchy of nodes. The root node sits at the top and contains a few key values that divide the data into ranges. Internal nodes further subdivide those ranges. Leaf nodes at the bottom contain the actual indexed values along with pointers to the corresponding rows in the table.

The tree stays balanced, meaning every leaf node is the same distance from the root. This guarantees that looking up any value requires the same number of steps, regardless of where in the range it falls.

Equality Lookups

To find all users where country = 'GB', the database starts at the root node and follows the path:

Root node contains DE | JP. Since 'GB' falls between 'DE' and 'JP', follow the middle pointer.
Internal node contains DE | GB. Since 'GB' matches, follow the pointer for 'GB'.
Leaf node contains GB → rows 3, 6. Fetch those rows from the table.

Three steps instead of scanning every row. On a real B-Tree with millions of entries, the tree might be 3-4 levels deep, meaning any lookup touches only 3-4 pages of data, not millions.

Range Queries

B-Trees handle range queries naturally because the leaf nodes are linked in order. To find all users who signed up after 2024-01-01:

SQL

The database uses the B-Tree index on signup_date to find the first entry after 2024-01-01, then walks the linked leaf nodes forward until it passes the end of the range. It does not need to scan the entire index, just the portion that matches.

This is why B-Trees support all of these operations efficiently:

Operation	Example	How the B-Tree Helps
Equality	`WHERE country = 'US'`	Navigate tree to exact value
Range	`WHERE signup_date > '2024-01-01'`	Find start point, scan leaf nodes forward
BETWEEN	`WHERE user_id BETWEEN 3 AND 7`	Find lower bound, scan to upper bound
ORDER BY	`ORDER BY signup_date`	Read leaf nodes in order (already sorted)
MIN / MAX	`SELECT MIN(signup_date)`	Jump to first or last leaf node
Prefix match	`WHERE username LIKE 'ali%'`	Navigate to 'ali', scan forward while prefix matches

One operation B-Trees cannot help with is a suffix or infix search like WHERE username LIKE '%dev'. The tree is sorted by the beginning of the string, so there is no shortcut to find values ending with 'dev'. That requires a sequential scan (or a specialized index type like GIN with trigram support, covered in the next chapter).

Sorted Output for Free

Because B-Tree leaf nodes are already in sorted order, the database can sometimes skip the sort step entirely when the query has an ORDER BY that matches the index order.

SQL

If an index exists on signup_date, the database can read the index leaf nodes in order and return the results without a separate sort operation. This matters on large tables where sorting millions of rows is expensive.

Hash Indexes

Hash indexes use a hash function to map each value to a fixed-size bucket. The database computes the hash of the lookup value and jumps directly to the right bucket.

SQL

How Hash Lookups Work

For a query like WHERE user_id = 4, the database:

Computes hash(4), which maps to a specific bucket.
Looks up that bucket, which contains pointers to rows where user_id = 4.
Fetches those rows.

This is an O(1) operation on average, compared to O(log n) for a B-Tree lookup. For pure equality checks on high-cardinality columns (many distinct values), hash indexes can be slightly faster.

Limitations

Hash indexes have significant restrictions compared to B-Trees:

Capability	B-Tree	Hash
Equality (`=`)	Yes	Yes
Range (`>`, `<`, `BETWEEN`)	Yes	No
ORDER BY	Yes	No
MIN / MAX	Yes	No
Prefix LIKE (`'ali%'`)	Yes	No
NULL handling	Yes	Varies by database

Because values are hashed, the original ordering is lost. The database cannot walk through hash buckets in sorted order, so range queries, sorting, and prefix matching all require a full table scan regardless of the hash index.

When to Use Hash Indexes

In practice, B-Tree indexes handle equality lookups well enough that hash indexes are rarely worth the trade-off. B-Trees give up a small amount of equality lookup speed in exchange for supporting every other operation. Most database administrators default to B-Tree for almost everything.

Hash indexes make sense in narrow scenarios: a column that is only ever filtered with =, never sorted or ranged, and where the marginal speed improvement matters. In PostgreSQL, hash indexes were not crash-safe until version 10 and are still not as widely used or tested as B-Trees.

Index Overhead: The Trade-offs

Indexes are not free. Every index you create comes with costs that affect writes, storage, and maintenance.

Storage Cost

An index is a separate data structure stored on disk alongside the table. A B-Tree index on a single integer column typically uses about 20-30% of the space the table data occupies. Text columns produce larger indexes because the keys are bigger. An index on a VARCHAR(100) column takes more space than an index on an INT column.

On a table with 10 million rows, a single B-Tree index might use 200-400 MB of disk. Five indexes on the same table could add 1-2 GB of storage overhead.

Write Amplification

Every time you insert, update, or delete a row, the database must also update every index on that table. If a table has 5 indexes, a single INSERT triggers 6 write operations: one for the table data and one for each index.

Updates are even worse when they modify an indexed column. The old entry must be removed from the index and a new one inserted. This is why heavily indexed tables can see significantly slower write performance.

Index Maintenance

B-Tree indexes can become fragmented over time as rows are inserted and deleted. Leaf nodes develop gaps where deleted entries used to be. PostgreSQL handles this with autovacuum, but on very write-heavy tables, index bloat can still cause performance degradation. Periodic REINDEX operations can reclaim wasted space.

Selectivity: When Indexes Actually Help

An index is only useful if it narrows down the search to a small fraction of the table. This property is called selectivity.

High Selectivity

A query with high selectivity matches a small percentage of rows. Indexes work well here because the database reads the index, finds a few row pointers, and fetches just those rows.

SQL

The user_id column has a unique value per row, so this query matches exactly one row. An index on user_id (which a PRIMARY KEY already provides) turns this into a fast lookup regardless of table size.

Low Selectivity

A query with low selectivity matches a large percentage of rows. In this case, an index can actually be slower than a sequential scan.

SQL

If 40% of users are on the free tier, the database would need to look up 40% of the rows through the index, then fetch each one from the table. The random I/O pattern of jumping between index entries and table pages is slower than just reading the table sequentially from start to finish.

The database's query planner knows this. It tracks statistics about column value distributions and decides whether to use an index or a sequential scan based on the estimated selectivity. You can see this decision by running EXPLAIN (covered in a later chapter).

The Threshold

There is no exact percentage where indexes stop being useful, as it depends on factors like table size, row width, and disk speed. As a rough guide:

Selectivity	Matching Rows	Index Useful?
Very high	< 1% of rows	Almost always yes
High	1-10% of rows	Usually yes
Medium	10-25% of rows	Depends on table size and row width
Low	> 25% of rows	Usually no, sequential scan is faster

The query planner makes this decision automatically. You do not need to force index usage. Instead, focus on creating indexes on columns with high selectivity that appear frequently in WHERE clauses.

Creating and Managing Indexes

Basic Syntax

SQL

The naming convention idx_{table}_{column} is not required but is widely used. A clear name makes it easier to identify what an index covers when reviewing performance.

Unique Indexes

A unique index enforces that no two rows can have the same value in the indexed column. Primary keys automatically create a unique index.

SQL

This both speeds up lookups on email and prevents duplicate email addresses. Attempting to insert a duplicate value will fail with a constraint violation error.

Dropping Indexes

Removing an index that is no longer needed frees up storage and speeds up writes:

SQL

In PostgreSQL, DROP INDEX takes an exclusive lock on the table by default, which blocks other queries. For production tables, use DROP INDEX CONCURRENTLY to avoid blocking:

SQL

Viewing Existing Indexes

To see what indexes already exist on a table:

Database	Command
PostgreSQL	`SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'streams';`
MySQL	`SHOW INDEX FROM streams;`
SQL Server	`sp_helpindex 'streams';`

Checking existing indexes before creating new ones avoids duplicate indexes, which waste storage and slow down writes without providing any query benefit.

Indexes on Primary and Foreign Keys

Primary key columns automatically get an index. You never need to create one manually.

Foreign key columns do not automatically get an index in most databases (MySQL's InnoDB is an exception, as it requires an index on foreign key columns and creates one if missing). Since foreign keys are frequently used in JOIN conditions, creating indexes on foreign key columns is one of the highest-impact optimizations you can make.

SQL

Common Mistakes

Indexing Every Column

More indexes are not always better. Each index slows down writes and uses disk space. A table with 15 columns does not need 15 indexes. Focus on columns that appear in WHERE clauses, JOIN conditions, and ORDER BY clauses of your most frequent queries.

Applying Functions to Indexed Columns

This is one of the most common ways to accidentally disable an index:

What goes wrong here?

SQL

The index is built on the raw started_at values. When you wrap the column in a function like EXTRACT, the database cannot match the transformed values against the index. It falls back to a sequential scan.

Fix: Rewrite the condition to use the column directly:

SQL

Now the database can use the B-Tree index on started_at to find the range.

Ignoring Foreign Key Indexes

Foreign key constraints enforce referential integrity, but they do not create indexes automatically (except in MySQL InnoDB). Without an index on the foreign key column, queries that join tables or delete parent rows perform sequential scans on the child table.

SQL

Adding CREATE INDEX idx_streams_track_id ON streams (track_id) allows the database to look up matching streams for each track using the index instead of scanning.

How Indexes Work