Last Updated: May 3, 2026
Most of the queries we have worked with so far has pulled data from a single table. That works fine when everything you need lives in one place, but real databases spread data across many tables. The streams table knows which user streamed which track, but it only stores IDs. The actual username lives in users, and the track title lives in tracks.
To answer "what did alice_m listen to?", you need to combine rows from multiple tables. That's what joins do.
Insert the following rows to follow along:
Note that user 6 (frank_j) has no streams in the sample data. This matters when we compare INNER JOIN (which would exclude frank_j) with LEFT JOIN (which would keep him with NULLs for the stream columns).
Normalization is why databases split data across tables: to avoid duplication and keep data consistent. A track title is stored once in tracks, not repeated in every row of streams. An artist's name is stored once in artists, not copied into every album and track.
This design is efficient for storage and updates, but it means that answering even simple questions requires pulling data from multiple tables. Consider these questions:
streams + users + tracksalbums + artistsstreams + usersThe columns you need are scattered across tables, connected by foreign keys. Joins follow those foreign key connections to reassemble the data.
Here is how the StreamFlow tables connect to each other:
The arrows show foreign key relationships. streams references both users (via user_id) and tracks (via track_id). tracks references albums (via album_id), and albums references artists (via artist_id). A join follows one or more of these links to combine rows from different tables.
Before writing any join queries, here are the terms you will see throughout this section of the course:
In a join, the table that appears before the JOIN keyword is the left table, and the table after it is the right table. The names come from their position in the SQL statement:
This distinction matters because some join types (LEFT JOIN, RIGHT JOIN) treat the two tables asymmetrically. INNER JOIN treats them the same, so the order does not affect the result.
The ON clause specifies how rows from the two tables should be matched. It is almost always an equality condition on a foreign key:
This tells the database: "for each row in streams, find the row in users where the user_id values match." The ON clause can also include multiple conditions joined with AND, which later chapters will cover.
The result of a join is a new set of rows. Each result row contains columns from both tables. If streams has 6 columns and users has 8 columns, the joined result has 14 columns (before any SELECT filtering).
In practice, you rarely SELECT * from a join. You pick the specific columns you need, using table aliases to avoid ambiguity when both tables share a column name (like country in this case).
The easiest way to understand joins is to think of them as a two-step process:
This is not how databases actually execute joins (they use much more efficient algorithms), but it is the correct mental model for understanding what a join produces.
Let's trace through a small example. Suppose we join streams with users on user_id. Here is a simplified version with just a few rows from each table:
streams (3 rows):
| stream_id | user_id | track_id |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 1 | 4 |
| 3 | 2 | 5 |
users (2 rows):
| user_id | username |
|---|---|
| 1 | alice_m |
| 2 | bob_jones |
Step 1: Cartesian product (3 x 2 = 6 combinations):
| stream_id | s.user_id | track_id | u.user_id | username | Match? |
|---|---|---|---|---|---|
| 1 | 1 | 2 | 1 | alice_m | Yes |
| 1 | 1 | 2 | 2 | bob_jones | No |
| 2 | 1 | 4 | 1 | alice_m | Yes |
| 2 | 1 | 4 | 2 | bob_jones | No |
| 3 | 2 | 5 | 1 | alice_m | No |
| 3 | 2 | 5 | 2 | bob_jones | Yes |
Step 2: Keep only rows where s.user_id = u.user_id:
| stream_id | user_id | track_id | username |
|---|---|---|---|
| 1 | 1 | 2 | alice_m |
| 2 | 1 | 4 | alice_m |
| 3 | 2 | 5 | bob_jones |
Three rows survive. Each stream row is now paired with the username of the user who created it. The Cartesian product created 6 combinations, and the ON filter kept only the 3 where the user IDs matched.
This "combine everything, then filter" model explains why a wrong or missing ON condition is dangerous. Without an ON clause, nothing gets filtered, and you end up with the full Cartesian product. With 10,000 streams and 5,000 users, that is 50 million rows.
Here is a real query that joins streams with users to show who streamed what:
A few things to notice about the syntax:
JOIN by itself means INNER JOIN. The two are interchangeable, but most people write just JOIN.s for streams, u for users) keep the query readable. Without them, you would write streams.user_id and users.user_id everywhere.streams.user_id references users.user_id.username or track_id) don't strictly need the alias prefix, but it's good practice to always qualify column names in joins. It makes the query unambiguous and easier to read.This query returns one row for every stream that has a matching user. Since all streams in our sample data have valid user IDs (1 through 5), every stream appears in the result.
But what if a user has no streams? User 6 (frank_j) did not stream anything in our sample data. With INNER JOIN, frank_j simply does not appear in the results. There is no stream row to match against, so there is no output row. Whether that's the behavior you want depends on the question you are answering.
SQL defines several join types, each differing in how they handle rows that have no match in the other table. Here is a quick overview. Each type gets its own chapter with detailed examples, so this is just the high-level picture.
Returns only rows that have a match in both tables. If a stream has no matching user, or a user has no matching stream, that row is excluded from the result.
This is the default join type and the most commonly used.
Returns all rows from the left table, plus matching rows from the right table. If a left row has no match, the right-side columns are filled with NULL.
Here, frank_j would appear in the result with stream_id = NULL because he has no streams.
The mirror of LEFT JOIN. Returns all rows from the right table, plus matching rows from the left table. In practice, you can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order, so RIGHT JOIN is less commonly used.
Returns all rows from both tables. Rows with a match are combined. Rows without a match appear with NULLs on the unmatched side. This is useful for data reconciliation, where you want to find mismatches between two tables.
Returns the Cartesian product of both tables, with no ON condition. Every row from the left table is combined with every row from the right table. This is rarely what you want for regular queries, but it has specific use cases like generating date grids or creating all possible combinations.
Joins a table to itself. This is not a separate syntax, just an INNER JOIN or LEFT JOIN where both sides reference the same table with different aliases. It is useful for comparing rows within a table, like finding artists who share the same country or following hierarchical relationships (an artist's manager is also an artist).
The following diagram shows the key difference between join types, specifically how each one handles matched and unmatched rows:
And here is a comparison table summarizing the differences:
| Join Type | Left Unmatched Rows | Right Unmatched Rows | Typical Use Case |
|---|---|---|---|
| INNER JOIN | Excluded | Excluded | Standard queries where you only want matched data |
| LEFT JOIN | Included (with NULLs) | Excluded | Finding missing records, optional relationships |
| RIGHT JOIN | Excluded | Included (with NULLs) | Rarely used, rewrite as LEFT JOIN |
| FULL OUTER JOIN | Included (with NULLs) | Included (with NULLs) | Data reconciliation, comparing two datasets |
| CROSS JOIN | N/A (no matching) | N/A (no matching) | Generating combinations, report grids |
When writing a join query, the first decision is which join type to use. This usually comes down to one question: what should happen to rows that don't have a match?
Here are some concrete examples from StreamFlow:
users to streams.artists through albums and tracks to streams, filtering where the stream columns are NULL.When you are unsure, start with INNER JOIN. It is the most restrictive (no unmatched rows sneak in), so the results are easier to reason about. If the results are missing rows you expected, switch to LEFT JOIN and check whether the missing rows lack matches.
Real queries often need data from three or more tables. To show "username, track title, and artist name" for each stream, you need to chain through four tables: streams → tracks → albums → artists.
The syntax chains additional JOIN clauses after the first one:
Each JOIN connects to a table already in the query. The second JOIN connects tracks to streams (via track_id). The third connects albums to tracks (via album_id). The fourth connects artists to albums (via artist_id).
The order of joins does not change the result for INNER JOIN. You could join users last instead of first and get the same rows. But writing them in the order that follows the logical chain (streams → tracks → albums → artists) makes the query easier to read.
Here is the join chain visualized:
One thing to watch with multi-table joins: if any single join in the chain has no match for a given row, that row disappears from the result entirely (with INNER JOIN). Stream 6 references track_id = 9, which is not in our tracks table. That stream will not appear in the result because the streams → tracks join finds no match. If you need to keep it, you would use a LEFT JOIN for that step in the chain.