Last Updated: May 3, 2026
The previous chapter introduced subqueries and briefly showed IN for testing set membership. But "does a related row exist?" is one of the most common questions in SQL, and there are several ways to ask it, each with different behavior around NULLs, different performance characteristics, and different readability trade-offs. This chapter covers all of them.
Insert the following rows to follow along:
The previous chapter showed a quick example of IN with a subquery. Here we'll look at it properly.
IN tests whether a value matches any value in a list. That list can come from a hardcoded set or from a subquery. With a subquery, IN becomes a way to filter rows from one table based on data in another.
Example: Which tracks belong to albums in the Electronic genre (genre_id = 2)?
The subquery returns the album IDs for Nocturnal (2) and Bass Drop (5). The outer query finds tracks whose album_id is in that set: Afterglow, Shadows, Drop Zone, and Pulse.
IN works naturally when the subquery returns a single column. The database compares each outer row's value against the entire list returned by the inner query.
You can also negate it.
Example: Which tracks are NOT on albums in the Electronic genre?
This returns the six tracks from Pop, J-Pop, and House albums. NOT IN seems straightforward, but it has a serious gotcha involving NULLs that we'll cover later in this chapter.
IN subqueries combine well with other WHERE conditions.
Example: Find premium users who have streamed at least one track from a Pop album:
The subquery joins streams through tracks to albums and filters to Pop albums (genre_id = 1). The outer query restricts to premium users whose user_id appears in that result. alice_m and carol_uk both match.
Some databases support comparing a tuple of columns against a subquery that returns matching columns. This is useful when a single column isn't enough to identify the relationship.
Find all playlist entries where the track was added on the same day it was streamed by the playlist's creator:
Multi-column IN is supported in PostgreSQL and MySQL. SQL Server does not support it directly, so you'd need to rewrite the condition using EXISTS or multiple AND conditions.
| Feature | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Single-column IN subquery | Yes | Yes | Yes |
| Multi-column IN subquery | Yes | Yes | No |
EXISTS takes a different approach than IN. Instead of comparing values, it asks a yes-or-no question: "Does the subquery return at least one row?" If the subquery produces any rows at all, EXISTS returns true. If it produces zero rows, EXISTS returns false.
The subquery inside EXISTS is almost always correlated, meaning it references a column from the outer query. This makes EXISTS check the condition for each row individually.
Example: Which users have streamed at least one track?
For each user in the outer query, the database runs the inner query with that user's user_id. If any stream row matches, EXISTS returns true and the user appears in the result. frank_j has no streams, so he's excluded.
A few things to note about this query:
SELECT 1 inside the EXISTS subquery is conventional but arbitrary. You could write SELECT *, SELECT 42, or SELECT s.stream_id. The database doesn't care what the subquery selects, only whether it returns at least one row. SELECT 1 is a hint to the reader that we're checking existence, not fetching values.
EXISTS stops as soon as it finds a match. If alice_m has 100 streams, the database doesn't scan all 100. It finds the first one, returns true, and moves on. This short-circuit behavior makes EXISTS efficient when matching rows are common.
In relational algebra, EXISTS implements a semi-join. A regular join between users and streams would produce one output row per stream (alice_m would appear 4 times because she has 4 streams). EXISTS produces each user at most once, no matter how many matching rows exist in the subquery. That's the "semi" part: it checks for existence without multiplying rows.
This same query written as a JOIN would need DISTINCT to get the same result:
Both produce the same output, but EXISTS expresses the intent more clearly: "give me users who have at least one stream." The join version says "combine users with their streams" and then deduplicates, which is a roundabout way to ask the same question.
The subquery inside EXISTS can be as complex as needed. Find users who have streamed a track by a verified artist:
The subquery joins streams through to artists, filtering to verified ones. For each outer user, it checks whether any of their streams lead to a verified artist. The correlation happens through s.user_id = u.user_id.
You can also use multiple EXISTS in the same WHERE clause. Find users who have streamed tracks from both Pop (genre_id = 1) and Electronic (genre_id = 2) genres:
Each EXISTS clause is independent. The first checks for Pop streams, the second for Electronic streams. Only users who satisfy both appear in the result. alice_m has streamed Pop tracks (Tidal from Ocean Drive) and Electronic tracks (Shadows from Nocturnal), so she qualifies.
NOT EXISTS is the inverse: it returns true when the subquery returns zero rows. This implements an anti-join, finding rows in one table that have no matching rows in another.
Which users have never streamed anything?
For each user, the database checks whether any streams exist. If the subquery returns nothing, NOT EXISTS is true and the user appears. With our sample data, only frank_j has no streams.
Anti-joins are everywhere in real queries: customers who haven't purchased, artists with no followers, tracks not on any playlist. NOT EXISTS handles all of these cleanly.
This requires joining through multiple tables because streams connect to artists through tracks and albums:
The subquery checks whether any stream leads back to the current artist. Neon Dreams has no streams in our data (nobody streamed Neon Skyline or Downtown... wait, bob_jones streamed Neon Skyline and Downtown). Let's trace through: every artist in our sample data has at least one stream. All five artists have tracks that appear in the streams table, so this query returns an empty result.
That's actually a useful outcome. If you expected "unstreamed artists" but got none, you know your data is more connected than you assumed. In production, with thousands of artists, you'd almost certainly find some.
The playlist_tracks table has entries for tracks 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, but let's check: tracks 1, 3, 8 are on playlist 1. Tracks 5, 9, 4 are on playlist 2. Tracks 6, 10 are on playlist 3. Tracks 7, 3, 4 are on playlist 4. Tracks 9, 2 are on playlist 5. Every track appears on at least one playlist, so this query also returns an empty result with our sample data.
The pattern itself is what matters. In a real dataset with thousands of tracks and selective playlists, NOT EXISTS would surface orphaned content that nobody has curated.
Another way to write an anti-join uses LEFT JOIN with a NULL check:
This returns the same result as the NOT EXISTS version: users with no matching streams. The LEFT JOIN keeps all users, and the WHERE clause filters to those where the join produced no match (stream_id is NULL because no stream row exists).
Both approaches work. NOT EXISTS tends to read more naturally for existence checks ("give me users where no stream exists"), while LEFT JOIN + IS NULL can be more familiar to people who think in terms of joins. Performance is typically identical because the optimizer recognizes both patterns and uses the same execution plan.
NOT IN looks like the simplest way to write an anti-join, but it has a dangerous edge case involving NULLs that can silently return wrong results.
Start with a query that works correctly. Find users who haven't created any playlists:
The subquery returns user IDs 1, 1, 2, 3, 5 (the creators of the five playlists). NOT IN filters to users whose ID is not in that list. dave_kr (4) and frank_j (6) appear.
This works because creator_user_id has no NULL values. Now consider a different query. Find artists whose country is not in the list of user countries:
You might expect this to return Neon Dreams (CA) and Aiko (JP), since no users have those countries. But this query returns zero rows.
The reason: emma_dev and frank_j have country values of 'US' and 'GB', but what about DJ Pulse? DJ Pulse has country = NULL. Wait, the issue is on the users side. Let me be precise.
Actually, let's look at the user countries. All six users have non-NULL country values: US, US, GB, KR, US, GB. So the subquery returns {'US', 'US', 'GB', 'KR', 'US', 'GB'}. No NULLs. That query would work fine.
The trap appears when the subquery's result contains NULL. Let's use the artists.country column instead. DJ Pulse has country = NULL.
The subquery returns {'US', 'GB', 'CA', 'JP', NULL}. Now NOT IN must check: is u.country not equal to every value in that list? For alice_m (country = 'US'), the check becomes:
For dave_kr (country = 'KR'):
That final comparison with NULL produces UNKNOWN, not TRUE. NOT IN requires all comparisons to be TRUE. Since one is UNKNOWN, the entire NOT IN evaluates to UNKNOWN, and the row is excluded. This happens for every row, because every value compared to NULL produces UNKNOWN. The query returns zero rows.
This is SQL's three-valued logic at work. Any comparison with NULL yields UNKNOWN, and NOT IN is semantically equivalent to AND-ing together a series of != checks. One UNKNOWN in a chain of ANDs makes the whole thing UNKNOWN.
NOT EXISTS doesn't have this problem. Rewrite the same query:
For dave_kr, the subquery looks for artists where ar.country = 'KR'. No artist has country 'KR', so the subquery returns zero rows. NOT EXISTS returns true, and dave_kr appears in the result. The NULL in DJ Pulse's country doesn't interfere because NULL = 'KR' is UNKNOWN, which is not true, so that row simply doesn't match the subquery's WHERE clause. The subquery returns zero qualifying rows, and NOT EXISTS correctly returns true.
| Operator | NULL in subquery result | Behavior |
|---|---|---|
| IN | No effect | Matches correctly |
| NOT IN | Poisons entire result | Returns zero rows when you expect matches |
| EXISTS | No effect | Checks for row existence, ignores NULL values |
| NOT EXISTS | No effect | Checks for row absence, ignores NULL values |
The rule is simple: prefer NOT EXISTS over NOT IN whenever the subquery column could contain NULLs. NOT EXISTS is never wrong in cases where NOT IN silently breaks.
If you prefer NOT IN for readability, filter out NULLs explicitly:
Adding WHERE ar.country IS NOT NULL removes the NULL from the list, and NOT IN works correctly again. But this requires you to remember the filter every time, and forgetting it produces no error. NOT EXISTS is the safer default.
IN checks whether a value equals any value in a list. ANY and ALL generalize this to work with other comparison operators: >, <, >=, <=, !=.
value operator ANY (subquery) returns true if the comparison is true for at least one row from the subquery. SOME is a synonym for ANY; they're interchangeable.
IN (subquery) is equivalent to = ANY (subquery). They do the same thing.
Find tracks that are longer than at least one track from the album Nocturnal (album_id = 2):
Nocturnal has two tracks: Afterglow (198s) and Shadows (312s). > ANY means "greater than the smallest value in the list," so this returns tracks longer than 198 seconds. That's every track except Sunrise (142), Petals (90), Downtown (165), Pulse (180), and Afterglow (198) itself. Basically, any track over 198 seconds qualifies.
value operator ALL (subquery) returns true only if the comparison is true for every row from the subquery.
Find tracks that are longer than every track from Nocturnal:
Now the track must be longer than both 198 and 312 seconds. Only Bloom (348) and Drop Zone (420) qualify.
> ALL (subquery) is equivalent to > (SELECT MAX(...) FROM ...), and > ANY (subquery) is equivalent to > (SELECT MIN(...) FROM ...). The explicit MAX/MIN versions are often clearer, but ANY and ALL are useful when you want to compare with a more complex subquery without wrapping it in another aggregate.
There's a subtle edge case: if the subquery returns zero rows, ALL returns true for every comparison. This is vacuously true, the same logic as "all unicorns can fly" being true because there are no unicorns to contradict it.
No album has album_id 999, so the subquery returns zero rows. > ALL (empty set) is true, so every track is returned. This is rarely what you'd want, so be careful when ALL's subquery might be empty.
Like NOT IN, ANY and ALL can produce unexpected results when the subquery contains NULLs. The behavior follows the same three-valued logic:
| Expression | NULL in subquery | Result |
|---|---|---|
value = ANY (...) | Ignored if other matches exist | True if any non-NULL matches |
value = ALL (...) | UNKNOWN if no non-NULL contradicts | Can return UNKNOWN |
value > ANY (...) | True if greater than any non-NULL | Works if at least one match |
value > ALL (...) | UNKNOWN if greater than all non-NULLs | Can return UNKNOWN |
In practice, ANY handles NULLs gracefully in most cases because it only needs one true comparison. ALL is more fragile because a single UNKNOWN comparison can make the entire expression UNKNOWN. If you're using ALL, make sure the subquery column doesn't contain NULLs, or filter them out.
| Expression | Equivalent To | Meaning |
|---|---|---|
x IN (subquery) | x = ANY (subquery) | Equals at least one |
x NOT IN (subquery) | x != ALL (subquery) | Not equal to any |
x > ANY (subquery) | x > MIN(subquery) | Greater than at least one |
x > ALL (subquery) | x > MAX(subquery) | Greater than every one |
x < ANY (subquery) | x < MAX(subquery) | Less than at least one |
x < ALL (subquery) | x < MIN(subquery) | Less than every one |
These three can often solve the same problem. Here's the same question answered three ways: which users have streamed a track from the album Sakura?
Using IN:
Using EXISTS:
Using JOIN:
All three return the same users: alice_m, carol_uk, and dave_kr. The differences are in readability, NULL safety, and sometimes performance.
Modern query optimizers in PostgreSQL, MySQL, and SQL Server are good at recognizing these patterns and choosing efficient execution plans regardless of which form you write. That said, there are situations where one form can perform better:
| Scenario | Tends to perform well |
|---|---|
| Subquery returns a small list | IN (hash lookup against a small set) |
| Subquery returns a large result | EXISTS (short-circuits on first match) |
| Outer table is small, inner table is large | EXISTS (fewer subquery executions) |
| Both tables are large with good indexes | Similar performance for all three |
| Anti-join (finding non-matches) | NOT EXISTS or LEFT JOIN + IS NULL |
These are tendencies, not guarantees. The actual execution plan depends on table statistics, indexes, and the optimizer's decisions. When performance matters, run EXPLAIN ANALYZE on your specific query and data.
| Question | Recommendation |
|---|---|
| Need to check if related rows exist? | EXISTS |
| Need to check if related rows DON'T exist? | NOT EXISTS |
| Filtering against a small, known set? | IN |
| Need columns from both tables? | JOIN |
| Anti-join and NULLs might be present? | NOT EXISTS (never NOT IN) |