Last Updated: May 3, 2026
SQL doesn't allow aggregate functions in a WHERE clause. You can't write WHERE duration_seconds > AVG(duration_seconds) because WHERE filters individual rows before any aggregation happens.
Subqueries solve this by letting you compute a value (or a whole result set) in one query and use it inside another.
Insert the following rows to follow along:
A subquery is a SELECT statement nested inside another SQL statement, enclosed in parentheses. The database executes the inner query first, produces a result, and then the outer query uses that result as if it were a constant value, a list of values, or a temporary table.
Here is the simplest example. We want tracks that are longer than the average track duration:
The inner query (SELECT AVG(duration_seconds) FROM tracks) computes a single number: the average duration across all 10 tracks. With our sample data, that average is 236.7 seconds. The outer query then becomes equivalent to WHERE duration_seconds > 236.7, filtering down to the tracks that exceed that threshold.
The following diagram shows how the database processes this:
The inner query runs independently. It doesn't know anything about the outer query, and the outer query doesn't care how the value was produced. It just receives 236.7 and uses it.
This two-phase execution is the mental model for non-correlated subqueries, which are the most common type. Later in this chapter, we'll look at correlated subqueries, where the inner query references the outer query and runs once per row instead of once total.
Subqueries are categorized by what they return. The type determines where you can use them and what operators work with them.
A scalar subquery returns exactly one value: one row, one column. This makes it usable anywhere a single value is expected, including in comparisons, arithmetic expressions, and the SELECT list.
The "tracks above average duration" query from the previous section used a scalar subquery. Here are more examples of increasing complexity.
Using a scalar subquery to find tracks shorter than the shortest album's longest track might sound convoluted, but it shows the flexibility. Start simpler. What is the duration of the longest track on the platform?
That returns 420 (Drop Zone). Now suppose you want every track that is at least half as long as the longest track. You could hard-code 210, but the value changes as new tracks are added. A scalar subquery keeps it dynamic:
The subquery returns 420, the division produces 210, and the WHERE clause filters to tracks with duration 210 seconds or more. You can use scalar subqueries in arithmetic just like any other value.
Scalar subqueries also work in the SELECT list. If you want each album's title alongside the platform-wide average track duration for comparison:
Every row in the result gets the same value for platform_avg because the subquery doesn't reference the outer query. It computes the average once, and that value appears in every row. This is useful when you want to compare individual values against a global benchmark.
A scalar subquery must return exactly one row. If it returns more than one row, the database raises an error. If it returns zero rows, the result is NULL.
A row subquery returns a single row with multiple columns. These are less common in practice, but some databases support comparing a tuple of columns against a row subquery.
For example, finding the artist who released the earliest album:
This is technically a scalar subquery because it returns one column. A true row subquery would return multiple columns and use tuple comparison syntax:
This finds albums that share both the same artist and genre as Ocean Drive. The inner query returns one row with two columns, and the outer query compares both columns at once.
Row subqueries require the inner query to return exactly one row. PostgreSQL, MySQL, and SQL Server all support this syntax, though the exact behavior varies. In practice, scalar and table subqueries cover the vast majority of use cases.
A table subquery returns a full result set: multiple rows and potentially multiple columns. You use these in the FROM clause (as derived tables) or with operators like IN that accept a list of values.
Here is a table subquery used as a derived table. We want to find users who have streamed more than 3 tracks:
The inner query computes the stream count per user. The outer query treats that result as a table named user_streams and filters it. This two-step approach is useful when you need to filter on an aggregated value without using HAVING, or when you want to join the aggregated result to another table.
Table subqueries are also used with IN to test membership:
This finds all tracks belonging to albums in the Pop genre (genre_id = 1). The inner query returns a list of album IDs, and the outer query checks each track's album_id against that list. The next chapter covers IN, EXISTS, and related operators in depth.
Subqueries can appear in four places within a SQL statement: WHERE, SELECT, FROM, and HAVING. Each placement serves a different purpose.
This is the most common placement. A subquery in WHERE computes a value (or set of values) that the outer query filters against.
We've already seen the basic pattern: find tracks above the average duration. Here is a slightly more complex version. Which users have streamed a track from the album "Sakura"?
This query has three levels. The innermost subquery finds the album_id for Sakura. The middle subquery uses that album_id to find all track_ids on that album. The outermost query finds users who streamed any of those tracks.
Nesting subqueries three levels deep is valid SQL, but readability drops fast. When a query reaches this level of complexity, a CTE (Common Table Expressions) is usually cleaner.
You can also use a scalar subquery with comparison operators other than =. For example, finding albums released before the average release date:
PostgreSQL can compute AVG on dates by treating them as numbers internally. This returns albums with earlier-than-average release dates.
A subquery in the SELECT list adds a computed column to every row. It must be a scalar subquery, returning exactly one value per row.
For each track, show its duration alongside the average duration of all tracks on the same album:
Notice that the inner query references t.album_id from the outer query. This makes it a correlated subquery: it runs once for each row in the outer query, computing the album average for that specific track's album.
A non-correlated subquery in SELECT is simpler. If you want every user's username alongside the total number of tracks on the platform:
The subquery runs once, returns 10, and that value appears in every row. This is useful for computing ratios or percentages. For example, what percentage of all streams does each user account for?
The scalar subquery (SELECT COUNT(*) FROM streams) computes the total stream count (15). The outer query divides each user's count by that total to get a percentage.
A subquery in the FROM clause creates a temporary result set that the outer query treats like a table. This temporary result is called a derived table. It must have an alias, otherwise the database has no name to reference it by.
Derived tables are useful when you need to aggregate data in one step and then filter or join in a second step. We want the average number of tracks per playlist:
The inner query counts tracks per playlist. The outer query averages those counts. You can't do this in a single query without a subquery (or CTE), because AVG(COUNT(*)) is not valid. You need to aggregate first, then aggregate again.
Derived tables can also be joined to other tables. Suppose you want each user's total listening time alongside their username, but only users who listened for more than 500 seconds total:
The derived table listening computes per-user totals. The outer query joins it to users to get usernames and filters to users above the 500-second threshold.
A more involved example: find the genre with the highest total number of streams.
The derived table chains streams through tracks to albums to get the genre_id, groups by genre, and counts streams. The outer query joins to genres for the name and picks the top result.
HAVING filters groups after aggregation. A scalar subquery in HAVING lets you compare group-level values against a computed threshold.
Find artists whose total stream count is above the average streams per artist:
The HAVING clause uses a nested subquery. The innermost query counts streams per artist. The middle query (the derived table per_artist) provides those counts as rows. The outer scalar subquery averages them. The result is the average streams per artist (15 streams / 5 artists = 3.0). Artists with more than 3 streams appear in the final result.
HAVING subqueries follow the same rules as WHERE subqueries: scalar subqueries for comparisons, table subqueries for IN.
The subqueries we've written fall into two categories based on whether the inner query depends on the outer query.
A non-correlated subquery runs independently. It produces the same result regardless of which outer row is being processed. The database executes it once, caches the result, and uses that result for every outer row.
The inner query computes one number (236.7) without referencing anything from the outer query.
A correlated subquery references a column from the outer query. This means the inner query depends on the current outer row, so the database conceptually re-executes it for each row the outer query processes.
The inner query now includes WHERE t2.album_id = t.album_id, which references t.album_id from the outer query. For each track, the database computes the average duration of tracks on the same album, then checks if the current track exceeds it.
The following diagram contrasts the two execution models:
Correlated subqueries are powerful for "compare each row to its group" problems, but the row-by-row execution model can be expensive on large datasets.
Subqueries, joins, and CTEs can often solve the same problem. Choosing between them is mostly about readability and maintainability. The database optimizer frequently rewrites one form into another internally, so performance is often equivalent.
Here's the same problem solved three ways: find tracks longer than the average duration, showing the track title and the average for reference.
Subquery in WHERE (no average in output):
Clean and direct. But if you also want to show the average in the output, you need a second subquery in SELECT or a different approach.
Join to a derived table:
The derived table computes the average once. The join condition filters tracks above it. The average is available in the output. This is slightly more verbose, but it avoids repeating the subquery.
CTE (preview of a later chapter):
The CTE names the intermediate result, making the intent clear. If you needed the average in multiple places within the same query, the CTE avoids computing it twice.
The following table provides a starting point for choosing between them:
| Situation | Good fit | Why |
|---|---|---|
| Filter against a single computed value | Subquery in WHERE | Concise, self-contained |
| Same computed result needed multiple times | CTE | Name it once, reference many times |
| Add a computed column per row | Scalar subquery in SELECT | Clean for a single extra column |
| Multi-step transformation (aggregate, then filter, then join) | CTE or derived table | Named stages improve readability |
| Testing set membership (is X in this list?) | Subquery with IN or EXISTS | Natural phrasing for existence checks |
| Combining columns from related tables | JOIN | That's what joins are designed for |
These are guidelines, not rules. The optimizer in PostgreSQL, MySQL, and SQL Server is good at recognizing equivalent patterns and choosing an efficient execution plan regardless of which form you write. Favor the form that makes the query easiest to read and maintain.