AlgoMaster Logo

Introduction to Joins

Last Updated: May 3, 2026

8 min read

Most of the queries we have worked with so far has pulled data from a single table. That works fine when everything you need lives in one place, but real databases spread data across many tables. The streams table knows which user streamed which track, but it only stores IDs. The actual username lives in users, and the track title lives in tracks.

To answer "what did alice_m listen to?", you need to combine rows from multiple tables. That's what joins do.

Sample Data

Insert the following rows to follow along:

SQL

Note that user 6 (frank_j) has no streams in the sample data. This matters when we compare INNER JOIN (which would exclude frank_j) with LEFT JOIN (which would keep him with NULLs for the stream columns).

Why Joins Exist

Normalization is why databases split data across tables: to avoid duplication and keep data consistent. A track title is stored once in tracks, not repeated in every row of streams. An artist's name is stored once in artists, not copied into every album and track.

This design is efficient for storage and updates, but it means that answering even simple questions requires pulling data from multiple tables. Consider these questions:

  • "What tracks did alice_m stream?" requires streams + users + tracks
  • "Which artist released the album 'Nocturnal'?" requires albums + artists
  • "How many streams came from premium users?" requires streams + users

The columns you need are scattered across tables, connected by foreign keys. Joins follow those foreign key connections to reassemble the data.

Here is how the StreamFlow tables connect to each other:

The arrows show foreign key relationships. streams references both users (via user_id) and tracks (via track_id). tracks references albums (via album_id), and albums references artists (via artist_id). A join follows one or more of these links to combine rows from different tables.

Join Terminology

Before writing any join queries, here are the terms you will see throughout this section of the course:

Left Table and Right Table

In a join, the table that appears before the JOIN keyword is the left table, and the table after it is the right table. The names come from their position in the SQL statement:

SQL

This distinction matters because some join types (LEFT JOIN, RIGHT JOIN) treat the two tables asymmetrically. INNER JOIN treats them the same, so the order does not affect the result.

Join Condition (ON Clause)

The ON clause specifies how rows from the two tables should be matched. It is almost always an equality condition on a foreign key:

SQL

This tells the database: "for each row in streams, find the row in users where the user_id values match." The ON clause can also include multiple conditions joined with AND, which later chapters will cover.

Join Result

The result of a join is a new set of rows. Each result row contains columns from both tables. If streams has 6 columns and users has 8 columns, the joined result has 14 columns (before any SELECT filtering).

In practice, you rarely SELECT * from a join. You pick the specific columns you need, using table aliases to avoid ambiguity when both tables share a column name (like country in this case).

How Joins Work Conceptually

The easiest way to understand joins is to think of them as a two-step process:

  1. Cartesian product: Combine every row from the left table with every row from the right table. If the left table has 10 rows and the right table has 6 rows, this produces 60 combinations.
  2. Filter: Keep only the combinations where the ON condition is true. Discard the rest.

This is not how databases actually execute joins (they use much more efficient algorithms), but it is the correct mental model for understanding what a join produces.

Let's trace through a small example. Suppose we join streams with users on user_id. Here is a simplified version with just a few rows from each table:

streams (3 rows):

stream_iduser_idtrack_id
112
214
325

users (2 rows):

user_idusername
1alice_m
2bob_jones

Step 1: Cartesian product (3 x 2 = 6 combinations):

stream_ids.user_idtrack_idu.user_idusernameMatch?
1121alice_mYes
1122bob_jonesNo
2141alice_mYes
2142bob_jonesNo
3251alice_mNo
3252bob_jonesYes

Step 2: Keep only rows where s.user_id = u.user_id:

stream_iduser_idtrack_idusername
112alice_m
214alice_m
325bob_jones

Three rows survive. Each stream row is now paired with the username of the user who created it. The Cartesian product created 6 combinations, and the ON filter kept only the 3 where the user IDs matched.

This "combine everything, then filter" model explains why a wrong or missing ON condition is dangerous. Without an ON clause, nothing gets filtered, and you end up with the full Cartesian product. With 10,000 streams and 5,000 users, that is 50 million rows.

Your First Join

Here is a real query that joins streams with users to show who streamed what:

SQL

A few things to notice about the syntax:

  • JOIN by itself means INNER JOIN. The two are interchangeable, but most people write just JOIN.
  • Table aliases (s for streams, u for users) keep the query readable. Without them, you would write streams.user_id and users.user_id everywhere.
  • The ON clause uses the foreign key relationship: streams.user_id references users.user_id.
  • Columns that exist in only one table (like username or track_id) don't strictly need the alias prefix, but it's good practice to always qualify column names in joins. It makes the query unambiguous and easier to read.

This query returns one row for every stream that has a matching user. Since all streams in our sample data have valid user IDs (1 through 5), every stream appears in the result.

But what if a user has no streams? User 6 (frank_j) did not stream anything in our sample data. With INNER JOIN, frank_j simply does not appear in the results. There is no stream row to match against, so there is no output row. Whether that's the behavior you want depends on the question you are answering.

Join Types Overview

SQL defines several join types, each differing in how they handle rows that have no match in the other table. Here is a quick overview. Each type gets its own chapter with detailed examples, so this is just the high-level picture.

INNER JOIN

Returns only rows that have a match in both tables. If a stream has no matching user, or a user has no matching stream, that row is excluded from the result.

SQL

This is the default join type and the most commonly used.

LEFT JOIN

Returns all rows from the left table, plus matching rows from the right table. If a left row has no match, the right-side columns are filled with NULL.

SQL

Here, frank_j would appear in the result with stream_id = NULL because he has no streams.

RIGHT JOIN

The mirror of LEFT JOIN. Returns all rows from the right table, plus matching rows from the left table. In practice, you can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order, so RIGHT JOIN is less commonly used.

FULL OUTER JOIN

Returns all rows from both tables. Rows with a match are combined. Rows without a match appear with NULLs on the unmatched side. This is useful for data reconciliation, where you want to find mismatches between two tables.

CROSS JOIN

Returns the Cartesian product of both tables, with no ON condition. Every row from the left table is combined with every row from the right table. This is rarely what you want for regular queries, but it has specific use cases like generating date grids or creating all possible combinations.

SELF JOIN

Joins a table to itself. This is not a separate syntax, just an INNER JOIN or LEFT JOIN where both sides reference the same table with different aliases. It is useful for comparing rows within a table, like finding artists who share the same country or following hierarchical relationships (an artist's manager is also an artist).

The following diagram shows the key difference between join types, specifically how each one handles matched and unmatched rows:

And here is a comparison table summarizing the differences:

Join TypeLeft Unmatched RowsRight Unmatched RowsTypical Use Case
INNER JOINExcludedExcludedStandard queries where you only want matched data
LEFT JOINIncluded (with NULLs)ExcludedFinding missing records, optional relationships
RIGHT JOINExcludedIncluded (with NULLs)Rarely used, rewrite as LEFT JOIN
FULL OUTER JOINIncluded (with NULLs)Included (with NULLs)Data reconciliation, comparing two datasets
CROSS JOINN/A (no matching)N/A (no matching)Generating combinations, report grids

Choosing the Right Join Type

When writing a join query, the first decision is which join type to use. This usually comes down to one question: what should happen to rows that don't have a match?

Here are some concrete examples from StreamFlow:

  • "Show each stream with the track title." Every stream references a valid track, so INNER JOIN works. You only want streams that have matching tracks.
  • "Show all users with their stream count, including users who never streamed." You need all users regardless of whether they have streams. That is a LEFT JOIN from users to streams.
  • "Find all artists who have never been streamed." You need all artists, then check for the absence of streams. That is a LEFT JOIN from artists through albums and tracks to streams, filtering where the stream columns are NULL.
  • "Generate a report with every genre and every month, showing stream counts including zeros." You need every combination of genre and month, even when no streams exist. That is a CROSS JOIN between genres and months, followed by a LEFT JOIN to streams.

When you are unsure, start with INNER JOIN. It is the most restrictive (no unmatched rows sneak in), so the results are easier to reason about. If the results are missing rows you expected, switch to LEFT JOIN and check whether the missing rows lack matches.

Joining More Than Two Tables

Real queries often need data from three or more tables. To show "username, track title, and artist name" for each stream, you need to chain through four tables: streamstracksalbumsartists.

The syntax chains additional JOIN clauses after the first one:

SQL

Each JOIN connects to a table already in the query. The second JOIN connects tracks to streams (via track_id). The third connects albums to tracks (via album_id). The fourth connects artists to albums (via artist_id).

The order of joins does not change the result for INNER JOIN. You could join users last instead of first and get the same rows. But writing them in the order that follows the logical chain (streams → tracks → albums → artists) makes the query easier to read.

Here is the join chain visualized:

One thing to watch with multi-table joins: if any single join in the chain has no match for a given row, that row disappears from the result entirely (with INNER JOIN). Stream 6 references track_id = 9, which is not in our tracks table. That stream will not appear in the result because the streamstracks join finds no match. If you need to keep it, you would use a LEFT JOIN for that step in the chain.