The previous chapter introduced subqueries and briefly showed IN for testing set membership. But "does a related row exist?" is one of the most common questions in SQL, and there are several ways to ask it, each with different behavior around NULLs, different performance characteristics, and different readability trade-offs. This chapter covers all of them.

Sample Data

Insert the following rows to follow along:

SQL

IN with Subqueries

The previous chapter showed a quick example of IN with a subquery. Here we'll look at it properly.

IN tests whether a value matches any value in a list. That list can come from a hardcoded set or from a subquery. With a subquery, IN becomes a way to filter rows from one table based on data in another.

Example: Which tracks belong to albums in the Electronic genre (genre_id = 2)?

SQL

The subquery returns the album IDs for Nocturnal (2) and Bass Drop (5). The outer query finds tracks whose album_id is in that set: Afterglow, Shadows, Drop Zone, and Pulse.

IN works naturally when the subquery returns a single column. The database compares each outer row's value against the entire list returned by the inner query.

You can also negate it.

Example: Which tracks are NOT on albums in the Electronic genre?

SQL

This returns the six tracks from Pop, J-Pop, and House albums. NOT IN seems straightforward, but it has a serious gotcha involving NULLs that we'll cover later in this chapter.

Chaining IN with Multiple Conditions

IN subqueries combine well with other WHERE conditions.

Example: Find premium users who have streamed at least one track from a Pop album:

SQL

The subquery joins streams through tracks to albums and filters to Pop albums (genre_id = 1). The outer query restricts to premium users whose user_id appears in that result. alice_m and carol_uk both match.

IN with Multiple Columns

Some databases support comparing a tuple of columns against a subquery that returns matching columns. This is useful when a single column isn't enough to identify the relationship.

Find all playlist entries where the track was added on the same day it was streamed by the playlist's creator:

SQL

Multi-column IN is supported in PostgreSQL and MySQL. SQL Server does not support it directly, so you'd need to rewrite the condition using EXISTS or multiple AND conditions.

Feature	PostgreSQL	MySQL	SQL Server
Single-column IN subquery	Yes	Yes	Yes
Multi-column IN subquery	Yes	Yes	No

EXISTS

EXISTS takes a different approach than IN. Instead of comparing values, it asks a yes-or-no question: "Does the subquery return at least one row?" If the subquery produces any rows at all, EXISTS returns true. If it produces zero rows, EXISTS returns false.

The subquery inside EXISTS is almost always correlated, meaning it references a column from the outer query. This makes EXISTS check the condition for each row individually.

Example: Which users have streamed at least one track?

SQL

For each user in the outer query, the database runs the inner query with that user's user_id. If any stream row matches, EXISTS returns true and the user appears in the result. frank_j has no streams, so he's excluded.

A few things to note about this query:

SELECT 1 inside the EXISTS subquery is conventional but arbitrary. You could write SELECT *, SELECT 42, or SELECT s.stream_id. The database doesn't care what the subquery selects, only whether it returns at least one row. SELECT 1 is a hint to the reader that we're checking existence, not fetching values.

EXISTS stops as soon as it finds a match. If alice_m has 100 streams, the database doesn't scan all 100. It finds the first one, returns true, and moves on. This short-circuit behavior makes EXISTS efficient when matching rows are common.

EXISTS as a Semi-Join

In relational algebra, EXISTS implements a semi-join. A regular join between users and streams would produce one output row per stream (alice_m would appear 4 times because she has 4 streams). EXISTS produces each user at most once, no matter how many matching rows exist in the subquery. That's the "semi" part: it checks for existence without multiplying rows.

This same query written as a JOIN would need DISTINCT to get the same result:

SQL

Both produce the same output, but EXISTS expresses the intent more clearly: "give me users who have at least one stream." The join version says "combine users with their streams" and then deduplicates, which is a roundabout way to ask the same question.

Adding Conditions to EXISTS

The subquery inside EXISTS can be as complex as needed. Find users who have streamed a track by a verified artist:

SQL

The subquery joins streams through to artists, filtering to verified ones. For each outer user, it checks whether any of their streams lead to a verified artist. The correlation happens through s.user_id = u.user_id.

You can also use multiple EXISTS in the same WHERE clause. Find users who have streamed tracks from both Pop (genre_id = 1) and Electronic (genre_id = 2) genres:

SQL

Each EXISTS clause is independent. The first checks for Pop streams, the second for Electronic streams. Only users who satisfy both appear in the result. alice_m has streamed Pop tracks (Tidal from Ocean Drive) and Electronic tracks (Shadows from Nocturnal), so she qualifies.

NOT EXISTS

NOT EXISTS is the inverse: it returns true when the subquery returns zero rows. This implements an anti-join, finding rows in one table that have no matching rows in another.

Which users have never streamed anything?

SQL

For each user, the database checks whether any streams exist. If the subquery returns nothing, NOT EXISTS is true and the user appears. With our sample data, only frank_j has no streams.

Anti-joins are everywhere in real queries: customers who haven't purchased, artists with no followers, tracks not on any playlist. NOT EXISTS handles all of these cleanly.

Finding Artists Who Have Never Been Streamed

This requires joining through multiple tables because streams connect to artists through tracks and albums:

SQL

The subquery checks whether any stream leads back to the current artist. Neon Dreams has no streams in our data (nobody streamed Neon Skyline or Downtown... wait, bob_jones streamed Neon Skyline and Downtown). Let's trace through: every artist in our sample data has at least one stream. All five artists have tracks that appear in the streams table, so this query returns an empty result.

That's actually a useful outcome. If you expected "unstreamed artists" but got none, you know your data is more connected than you assumed. In production, with thousands of artists, you'd almost certainly find some.

Tracks Not on Any Playlist

SQL

The playlist_tracks table has entries for tracks 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, but let's check: tracks 1, 3, 8 are on playlist 1. Tracks 5, 9, 4 are on playlist 2. Tracks 6, 10 are on playlist 3. Tracks 7, 3, 4 are on playlist 4. Tracks 9, 2 are on playlist 5. Every track appears on at least one playlist, so this query also returns an empty result with our sample data.

The pattern itself is what matters. In a real dataset with thousands of tracks and selective playlists, NOT EXISTS would surface orphaned content that nobody has curated.

NOT EXISTS vs LEFT JOIN WHERE NULL

Another way to write an anti-join uses LEFT JOIN with a NULL check:

SQL

This returns the same result as the NOT EXISTS version: users with no matching streams. The LEFT JOIN keeps all users, and the WHERE clause filters to those where the join produced no match (stream_id is NULL because no stream row exists).

Both approaches work. NOT EXISTS tends to read more naturally for existence checks ("give me users where no stream exists"), while LEFT JOIN + IS NULL can be more familiar to people who think in terms of joins. Performance is typically identical because the optimizer recognizes both patterns and uses the same execution plan.

NOT IN and the NULL Trap

NOT IN looks like the simplest way to write an anti-join, but it has a dangerous edge case involving NULLs that can silently return wrong results.

Start with a query that works correctly. Find users who haven't created any playlists:

SQL

The subquery returns user IDs 1, 1, 2, 3, 5 (the creators of the five playlists). NOT IN filters to users whose ID is not in that list. dave_kr (4) and frank_j (6) appear.

This works because creator_user_id has no NULL values. Now consider a different query. Find artists whose country is not in the list of user countries:

SQL

You might expect this to return Neon Dreams (CA) and Aiko (JP), since no users have those countries. But this query returns zero rows.

The reason: emma_dev and frank_j have country values of 'US' and 'GB', but what about DJ Pulse? DJ Pulse has country = NULL. Wait, the issue is on the users side. Let me be precise.

Actually, let's look at the user countries. All six users have non-NULL country values: US, US, GB, KR, US, GB. So the subquery returns {'US', 'US', 'GB', 'KR', 'US', 'GB'}. No NULLs. That query would work fine.

The trap appears when the subquery's result contains NULL. Let's use the artists.country column instead. DJ Pulse has country = NULL.

SQL

The subquery returns {'US', 'GB', 'CA', 'JP', NULL}. Now NOT IN must check: is u.country not equal to every value in that list? For alice_m (country = 'US'), the check becomes:

'US' != 'US'? No. Already false, so NOT IN is false.

For dave_kr (country = 'KR'):

'KR' != 'US'? Yes.
'KR' != 'GB'? Yes.
'KR' != 'CA'? Yes.
'KR' != 'JP'? Yes.
'KR' != NULL? Unknown.

That final comparison with NULL produces UNKNOWN, not TRUE. NOT IN requires all comparisons to be TRUE. Since one is UNKNOWN, the entire NOT IN evaluates to UNKNOWN, and the row is excluded. This happens for every row, because every value compared to NULL produces UNKNOWN. The query returns zero rows.

This is SQL's three-valued logic at work. Any comparison with NULL yields UNKNOWN, and NOT IN is semantically equivalent to AND-ing together a series of != checks. One UNKNOWN in a chain of ANDs makes the whole thing UNKNOWN.

Why NOT EXISTS Is Safer

NOT EXISTS doesn't have this problem. Rewrite the same query:

SQL

For dave_kr, the subquery looks for artists where ar.country = 'KR'. No artist has country 'KR', so the subquery returns zero rows. NOT EXISTS returns true, and dave_kr appears in the result. The NULL in DJ Pulse's country doesn't interfere because NULL = 'KR' is UNKNOWN, which is not true, so that row simply doesn't match the subquery's WHERE clause. The subquery returns zero qualifying rows, and NOT EXISTS correctly returns true.

Operator	NULL in subquery result	Behavior
IN	No effect	Matches correctly
NOT IN	Poisons entire result	Returns zero rows when you expect matches
EXISTS	No effect	Checks for row existence, ignores NULL values
NOT EXISTS	No effect	Checks for row absence, ignores NULL values

The rule is simple: prefer NOT EXISTS over NOT IN whenever the subquery column could contain NULLs. NOT EXISTS is never wrong in cases where NOT IN silently breaks.

Fixing NOT IN When You Must Use It

If you prefer NOT IN for readability, filter out NULLs explicitly:

SQL

Adding WHERE ar.country IS NOT NULL removes the NULL from the list, and NOT IN works correctly again. But this requires you to remember the filter every time, and forgetting it produces no error. NOT EXISTS is the safer default.

ANY (SOME) and ALL

IN checks whether a value equals any value in a list. ANY and ALL generalize this to work with other comparison operators: >, <, >=, <=, !=.

ANY and SOME

value operator ANY (subquery) returns true if the comparison is true for at least one row from the subquery. SOME is a synonym for ANY; they're interchangeable.

IN (subquery) is equivalent to = ANY (subquery). They do the same thing.

Find tracks that are longer than at least one track from the album Nocturnal (album_id = 2):

SQL

Nocturnal has two tracks: Afterglow (198s) and Shadows (312s). > ANY means "greater than the smallest value in the list," so this returns tracks longer than 198 seconds. That's every track except Sunrise (142), Petals (90), Downtown (165), Pulse (180), and Afterglow (198) itself. Basically, any track over 198 seconds qualifies.

ALL

value operator ALL (subquery) returns true only if the comparison is true for every row from the subquery.

Find tracks that are longer than every track from Nocturnal:

SQL

Now the track must be longer than both 198 and 312 seconds. Only Bloom (348) and Drop Zone (420) qualify.

> ALL (subquery) is equivalent to > (SELECT MAX(...) FROM ...), and > ANY (subquery) is equivalent to > (SELECT MIN(...) FROM ...). The explicit MAX/MIN versions are often clearer, but ANY and ALL are useful when you want to compare with a more complex subquery without wrapping it in another aggregate.

ALL with an Empty Subquery

There's a subtle edge case: if the subquery returns zero rows, ALL returns true for every comparison. This is vacuously true, the same logic as "all unicorns can fly" being true because there are no unicorns to contradict it.

SQL

No album has album_id 999, so the subquery returns zero rows. > ALL (empty set) is true, so every track is returned. This is rarely what you'd want, so be careful when ALL's subquery might be empty.

NULL Behavior with ANY and ALL

Like NOT IN, ANY and ALL can produce unexpected results when the subquery contains NULLs. The behavior follows the same three-valued logic:

Expression	NULL in subquery	Result
`value = ANY (...)`	Ignored if other matches exist	True if any non-NULL matches
`value = ALL (...)`	UNKNOWN if no non-NULL contradicts	Can return UNKNOWN
`value > ANY (...)`	True if greater than any non-NULL	Works if at least one match
`value > ALL (...)`	UNKNOWN if greater than all non-NULLs	Can return UNKNOWN

In practice, ANY handles NULLs gracefully in most cases because it only needs one true comparison. ALL is more fragile because a single UNKNOWN comparison can make the entire expression UNKNOWN. If you're using ALL, make sure the subquery column doesn't contain NULLs, or filter them out.

Quick Reference

Expression	Equivalent To	Meaning
`x IN (subquery)`	`x = ANY (subquery)`	Equals at least one
`x NOT IN (subquery)`	`x != ALL (subquery)`	Not equal to any
`x > ANY (subquery)`	`x > MIN(subquery)`	Greater than at least one
`x > ALL (subquery)`	`x > MAX(subquery)`	Greater than every one
`x < ANY (subquery)`	`x < MAX(subquery)`	Less than at least one
`x < ALL (subquery)`	`x < MIN(subquery)`	Less than every one

Choosing Between EXISTS, IN, and JOIN

These three can often solve the same problem. Here's the same question answered three ways: which users have streamed a track from the album Sakura?

Using IN:

SQL

Using EXISTS:

SQL

Using JOIN:

SQL

All three return the same users: alice_m, carol_uk, and dave_kr. The differences are in readability, NULL safety, and sometimes performance.

Readability

IN reads naturally for simple membership tests: "where user_id is in this list."
EXISTS reads naturally for existence checks: "where a matching row exists."
JOIN reads naturally when you need columns from both tables in the output. If you only need columns from one table and have to add DISTINCT, EXISTS or IN is usually cleaner.

NULL Safety

IN is safe. NOT IN is not safe when NULLs are present.
EXISTS and NOT EXISTS are always NULL-safe.
JOIN and LEFT JOIN + IS NULL are always NULL-safe.

Performance

Modern query optimizers in PostgreSQL, MySQL, and SQL Server are good at recognizing these patterns and choosing efficient execution plans regardless of which form you write. That said, there are situations where one form can perform better:

Scenario	Tends to perform well
Subquery returns a small list	IN (hash lookup against a small set)
Subquery returns a large result	EXISTS (short-circuits on first match)
Outer table is small, inner table is large	EXISTS (fewer subquery executions)
Both tables are large with good indexes	Similar performance for all three
Anti-join (finding non-matches)	NOT EXISTS or LEFT JOIN + IS NULL

These are tendencies, not guarantees. The actual execution plan depends on table statistics, indexes, and the optimizer's decisions. When performance matters, run EXPLAIN ANALYZE on your specific query and data.

Decision Framework

Question	Recommendation
Need to check if related rows exist?	EXISTS
Need to check if related rows DON'T exist?	NOT EXISTS
Filtering against a small, known set?	IN
Need columns from both tables?	JOIN
Anti-join and NULLs might be present?	NOT EXISTS (never NOT IN)

EXISTS, NOT EXISTS, IN, and NOT IN