Last Updated: May 3, 2026
Sometimes the data you need lives in two separate queries. Maybe you want a combined activity feed that pulls from both follows and artist_follows, or you need to merge completed and refunded payments into a single report.
SQL gives you UNION and UNION ALL to stack result sets vertically, and picking the right one matters more than most people realize.
The follow data is designed so that combining user-follows and artist-follows produces interesting overlaps. Insert the following rows to follow along:
UNION ALL is the simpler of the two operators. It takes the result set from one query, takes the result set from another, and stacks them together. No filtering, no deduplication, just concatenation.
Say you want a list of all "follow" activity on the platform, combining both user-to-user follows and user-to-artist follows into one feed. These live in separate tables with different structures, but you can reshape them into matching columns:
This returns every row from the first query followed by every row from the second. If follows has 8 rows and artist_follows has 10 rows, the result has 18 rows. UNION ALL never removes anything.
A few things to notice about the structure:
user_id, follow_type, etc.) are ignored for naming purposes, though they still help readability.followed_user_id and artist_id are integers, so they can share the target_id column. The literal 'user' and 'artist' strings are both text, so the follow_type column works too.The 'user' and 'artist' string literals in the query above are a common pattern. When you combine rows from different sources, you often need a discriminator column so you can tell where each row came from. Without it, you would have a list of IDs with no way to know whether target_id = 3 means user 3 or artist 3.
SQL matches columns by position, not by name. The first column of the first query pairs with the first column of the second query, regardless of what they're called. This means you could write:
This runs without errors because both queries have two integer columns. But the result is nonsense: you've paired followed_user_id with user_id and follower_user_id with artist_id. The database doesn't check whether the pairing makes sense. That's your job.
UNION does the same stacking as UNION ALL, then adds one extra step: it removes duplicate rows from the combined result.
"Duplicate" here means every column in the row matches exactly. If two rows differ in even one column, they both survive.
On January 12, alice (user 1) followed carol (in follows) and alice also followed a track's artist in artist_follows. If both tables produce a row with (1, '2024-01-12'), UNION keeps only one copy. UNION ALL would keep both.
The deduplication considers all columns. If you add a source discriminator column (like the 'user' and 'artist' labels from earlier), no two rows would ever match because the source column would always differ. In that case, UNION and UNION ALL produce identical results, but UNION still pays the cost of checking for duplicates.
UNION only removes rows when both queries can produce identical output rows. This happens naturally in a few situations:
follows and artist_follows) where the same user appears in both.If your two queries pull from completely separate tables and include a source discriminator column, duplicates are structurally impossible. UNION ALL is the right choice in that case.
Every UNION or UNION ALL has three requirements:
Both queries must produce the same number of columns. If one query returns 3 columns and the other returns 4, the database raises an error.
What goes wrong here?
This fails because the first query has 3 columns and the second has 2.
Fix: Pad the shorter query with a NULL or a literal:
Now both queries have 3 columns. Since artists also has a country column, this works cleanly. If the second table lacked a matching column, you could use NULL AS country instead.
The database needs to be able to combine each column pair into a single output type. Integers with integers, text with text, timestamps with timestamps. Most databases will implicitly cast compatible types. For example, an INTEGER and a BIGINT will work because integers can widen to bigints. But pairing a text column with an integer column fails in most databases.
| Pairing | Works? | Result Type |
|---|---|---|
| INTEGER + INTEGER | Yes | INTEGER |
| INTEGER + BIGINT | Yes | BIGINT |
| VARCHAR + VARCHAR | Yes | VARCHAR |
| INTEGER + VARCHAR | No | Error in most databases |
| DATE + TIMESTAMP | Yes | TIMESTAMP |
| BOOLEAN + INTEGER | Depends | PostgreSQL: yes (auto-cast). MySQL: yes. SQL Server: no |
As covered earlier, columns match by position. The first column from each query combines into the first result column, the second into the second, and so on. Column names from the second (and any subsequent) query are ignored. Only the first query's column names or aliases appear in the result.
This means you should always verify that column positions align semantically, not just type-wise.
The performance difference between UNION and UNION ALL comes down to one thing: deduplication costs work.
UNION ALL just concatenates the two result sets. The database does no additional work beyond running both queries.
UNION concatenates and then sorts or hashes the combined result to find and remove duplicates. For small result sets, the difference is negligible. For large ones (millions of rows), it can be significant. Sorting is O(n log n), and the hash-based approach needs memory proportional to the number of distinct rows.
Use UNION ALL unless you specifically need deduplication. Three common scenarios where UNION ALL is safe:
'user' vs 'artist' label, no row from one query can match a row from the other.status = 'completed' and the other filters status = 'refunded'. No row can appear in both.If duplicates can appear and you want them removed, use UNION. But think about whether the duplicates actually matter for your use case. An analytics query that counts rows might give wrong results with UNION (undercounting), while a reporting query that lists items might show confusing repeats with UNION ALL.
You can chain multiple UNION or UNION ALL operators to combine three or more queries:
This builds a combined event timeline from three separate tables. Each UNION ALL stacks the next result set onto the previous ones.
You can mix them in the same statement, but the behavior can be surprising. In standard SQL and PostgreSQL, UNION has higher precedence than UNION ALL. Consider:
The precedence rules mean this is interpreted as:
Query B and Query C get deduplicated first (via UNION), then that result is concatenated with Query A (via UNION ALL). This is rarely the intent. If you need to control the order of operations, use parentheses (in databases that support them) or restructure into CTEs:
In practice, mixing UNION and UNION ALL in a single statement is confusing and error-prone. Pick one and stick with it, or break complex combinations into CTEs to make the logic explicit.
UNION and JOIN both combine data, but in fundamentally different directions.
A JOIN matches rows from two tables based on a condition and produces wider rows (more columns). A UNION stacks rows from two queries and produces taller results (more rows). The column count stays the same.
| Question | Use |
|---|---|
| "Show each user alongside their subscription info" | JOIN |
"List all user IDs who appear in either follows or artist_follows" | UNION |
| "For each stream, include the track title and artist name" | JOIN (multiple) |
| "Combine completed and refunded payments into one list" | UNION ALL |
| "Find users who have both a stream and a payment" | JOIN or INTERSECT |
| "Build a timeline of all user activity across different tables" | UNION ALL |
The key distinction: if you need to enrich rows with data from another table, use a JOIN. If you need to stack similar rows from different sources, use UNION.
One common mistake is using UNION when a JOIN is appropriate. For example, "show me each user's name and their most recent stream" requires a JOIN because you need columns from both users and streams in the same row. A UNION can't do this because it doesn't pair rows from different queries.
An ORDER BY at the end of a UNION or UNION ALL applies to the entire combined result, not just the last query.
This returns the 10 most recent follows across both tables, sorted by date. The ORDER BY and LIMIT sit outside both queries and operate on the merged result.
What goes wrong here?
In PostgreSQL, this is a syntax error. You cannot put ORDER BY on an individual query within a UNION unless you wrap it in a subquery. The database sees ORDER BY followed_at UNION ALL and doesn't know what to do with it.
If you need each component query sorted before combining (which is rare), wrap it:
This takes the 5 most recent user follows and the 5 most recent artist follows, then stacks them. The subqueries handle their own sorting and limiting independently. Note that the final result's order is not guaranteed unless you add an outer ORDER BY.
| Dialect | ORDER BY in individual UNION query | Subquery workaround |
|---|---|---|
| PostgreSQL | Syntax error | Yes, with subquery |
| MySQL | Allowed but ignored without LIMIT | Yes |
| SQL Server | Syntax error | Yes, with TOP or subquery |