Last Updated: May 3, 2026
SQL looks straightforward until it doesn't. A query runs without errors but returns wrong results, a filter silently drops rows it shouldn't, or an aggregation produces a number that's off by just enough to be dangerous. These aren't bugs in the database. They're defined behaviors that happen to conflict with what most people intuitively expect.
This chapter collects the most common gotchas into one place, each with a "what you'd expect" vs "what actually happens" breakdown and a clear fix.
Insert these rows to follow along:
NULL represents an unknown value, not zero, not an empty string, not false. Any arithmetic operation involving NULL produces NULL, and any comparison with NULL produces UNKNOWN rather than TRUE or FALSE.
For streams where duration_listened is NULL (stream_id 4 and 9), both plus_ten and doubled are also NULL. NULL + 10 doesn't produce 10. It produces NULL, because adding anything to an unknown value still gives an unknown value.
The same logic applies to comparisons. This query returns no rows for NULL countries:
Nothing comes back, even for Carol and Frank whose country is NULL. The expression NULL = NULL evaluates to UNKNOWN, not TRUE. SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. WHERE only keeps rows where the condition is TRUE, so UNKNOWN gets filtered out just like FALSE.
| Expression | Result | Why |
|---|---|---|
NULL = NULL | UNKNOWN | Can't know if two unknowns are equal |
NULL <> NULL | UNKNOWN | Same reason |
NULL > 5 | UNKNOWN | Can't compare unknown to a value |
NULL + 10 | NULL | Arithmetic with unknown stays unknown |
NULL AND TRUE | UNKNOWN | Unknown might be FALSE |
NULL OR TRUE | TRUE | TRUE regardless of the other side |
NULL AND FALSE | FALSE | FALSE regardless of the other side |
The correct way to check for NULL is with IS NULL and IS NOT NULL:
This returns Carol and Frank as expected.
When you need NULL to behave like a real value, use COALESCE:
COALESCE takes the first non-NULL argument. This turns NULL durations into 0 before doing math, so the addition works as expected.
This is one of the most dangerous SQL gotchas because the query runs without errors but silently returns wrong results.
Say you want to find artists who are not from the US or GB:
This returns Echo Chamber (DE). But Pixel Dreams (country is NULL) is missing from the results. Why?
NOT IN is equivalent to a chain of <> comparisons combined with AND. The expression NULL NOT IN ('US', 'GB') expands to NULL <> 'US' AND NULL <> 'GB'. Each comparison with NULL produces UNKNOWN, and UNKNOWN AND UNKNOWN is UNKNOWN. WHERE discards UNKNOWN rows.
The problem gets worse when the subquery itself contains NULL values:
The users table has NULL values in country (Carol and Frank). This means the subquery returns {'US', 'GB', NULL, 'DE'}. Now every row in artists gets compared against NULL, every NOT IN evaluation includes an UNKNOWN term, and the entire query returns zero rows.
Zero rows. Not "all except US and GB." Zero.
NOT EXISTS handles NULLs correctly because it checks for the existence of matching rows rather than comparing values directly. When a.country is NULL, the inner query finds no rows where u.country = NULL is TRUE (because NULL = NULL is UNKNOWN), so NOT EXISTS returns TRUE, correctly including Pixel Dreams.
| Method | NULL in list/subquery | NULL in outer column | Safe? |
|---|---|---|---|
NOT IN (values) | Breaks silently | Excluded silently | No |
NOT IN (subquery) | Returns empty set | Excluded silently | No |
NOT EXISTS | Handled correctly | Handled correctly | Yes |
Use NOT EXISTS as the default when filtering against another table. Reserve NOT IN for hardcoded lists where you're certain no NULLs are involved.
These two forms of COUNT look similar but behave differently with NULLs.
COUNT(*) counts every row regardless of NULLs. It returns 6. COUNT(country) skips rows where country is NULL (Carol and Frank), returning 4. COUNT(DISTINCT country) counts unique non-NULL values: US, GB, DE. That's 3.
This distinction matters when calculating rates and percentages:
The first query divides 3 by 4 (75%), ignoring the two NULL-country users entirely. The second divides 3 by 6 (50%), which accurately reflects the proportion of all users.
You'll sometimes see COUNT(1) instead of COUNT(*). They're equivalent in every major database. The optimizer treats them identically. Use whichever your team prefers, but be consistent.
Most databases truncate integer division rather than returning a decimal result.
If 3 out of 10 streams are long, bad_ratio returns 0, not 0.3. Integer divided by integer gives an integer, and 3 / 10 truncates to 0.
There are several ways to fix this:
All three produce the correct decimal result. The key is making at least one operand non-integer before the division happens.
| Database | 5 / 2 Result | Fix |
|---|---|---|
| PostgreSQL | 2 | 5.0 / 2 or CAST(5 AS DECIMAL) / 2 |
| MySQL | 2 | 5 / 2.0 or CAST(5 AS DECIMAL) / 2 |
| SQL Server | 2 | 5.0 / 2 or CAST(5 AS DECIMAL(10,2)) / 2 |
This is consistent across databases. If both sides are integers, the result is an integer.
BETWEEN is inclusive on both ends. For dates, that's usually fine. For timestamps, it creates a subtle trap.
This only matches streams at exactly midnight (2024-01-15 00:00:00). Every stream that happened after midnight on January 15 is excluded because 2024-01-15 10:00:00 is greater than 2024-01-15 00:00:00 (which is what the bare date string casts to as a timestamp).
The same problem occurs with date ranges:
This misses every stream on January 31 that occurred after midnight. A stream at 2024-01-31 14:30:00 is greater than 2024-01-31 00:00:00, so it's excluded.
This pattern (>= start, < exclusive end) captures every possible timestamp within January, no matter the time component. It's clearer about the boundary behavior and works correctly with any timestamp precision.
| Pattern | January 31 at 14:30 | Safe for timestamps? |
|---|---|---|
BETWEEN '2024-01-01' AND '2024-01-31' | Excluded | No |
BETWEEN '2024-01-01' AND '2024-01-31 23:59:59' | Included, but misses sub-second times | No |
>= '2024-01-01' AND < '2024-02-01' | Included | Yes |
The half-open range approach is the standard recommendation for timestamp filtering. Avoid BETWEEN with timestamp columns entirely.
SQL doesn't execute in the order you write it. The logical execution order determines where you can reference what.
SELECT runs after WHERE and GROUP BY. That means aliases defined in SELECT don't exist yet when WHERE and GROUP BY execute.
What goes wrong here?
stream_count is defined in SELECT, but WHERE runs before SELECT. The alias doesn't exist at that point.
Fix: Use HAVING for post-aggregation filtering, or wrap in a subquery:
There's one exception to this rule: ORDER BY runs after SELECT, so it can reference aliases.
| Clause | Can use SELECT aliases? | Why |
|---|---|---|
| WHERE | No | Runs before SELECT |
| GROUP BY | No (PostgreSQL), Yes (MySQL) | Standard says no |
| HAVING | No (standard), Yes (MySQL) | Must repeat the expression |
| ORDER BY | Yes | Runs after SELECT |
MySQL is more permissive than the SQL standard here. PostgreSQL and SQL Server follow the standard more strictly. Write queries that work everywhere by not relying on alias references in WHERE, GROUP BY, or HAVING.
When you use GROUP BY, every column in SELECT must either appear in the GROUP BY clause or be inside an aggregate function. PostgreSQL and SQL Server enforce this strictly. MySQL, by default, does not.
PostgreSQL rejects this because email is in SELECT but not in GROUP BY or an aggregate. MySQL runs it and picks an arbitrary email value from the group. The result might be correct by coincidence, or it might silently return a wrong email.
This behavior in MySQL is controlled by the ONLY_FULL_GROUP_BY SQL mode. Modern MySQL versions (5.7.5+) enable it by default, so you'll get an error. But older installations or configurations that disable it will silently return unpredictable results.
Fix: Always list every non-aggregated column in GROUP BY:
If you need a column that's functionally dependent on the GROUP BY key (like email depending on username because username is unique), PostgreSQL allows it when grouping by the primary key. But the safest approach is to be explicit.
UNION removes duplicate rows from the combined result. UNION ALL keeps everything.
The gotcha is performance. UNION has to sort or hash the entire combined result set to find and remove duplicates. On large tables, this is expensive. If you know the two sets are already disjoint (or if duplicates are acceptable), UNION ALL avoids that overhead entirely.
There's also a correctness gotcha. If the two queries can produce the same row and you need to count occurrences, UNION silently collapses them. For example, if Alice streamed from the US and is also a US user, UNION might drop what looks like a duplicate row.
| Scenario | Use |
|---|---|
| Combining results from different tables, duplicates possible and unwanted | UNION |
| Combining results that are guaranteed disjoint | UNION ALL |
| Building a report where each row must be counted | UNION ALL |
| Performance matters and duplicates are acceptable | UNION ALL |
The default should be UNION ALL unless you specifically need deduplication. Reach for UNION only when you have a clear reason to remove duplicates.
Window functions with ORDER BY use a default frame that catches people off guard.
When you write SUM(amount) OVER (ORDER BY payment_date), the default frame is not "all rows in the partition." It's RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This matters because RANGE groups together rows with the same ORDER BY value.
Payments 1 through 4 all share the date 2024-01-15. With the default RANGE frame, the running total for all four rows is the same: the sum of all rows with payment_date <= '2024-01-15'. It doesn't accumulate row by row. It jumps to the group total.
| payment_id | amount | payment_date | running_total (RANGE, default) | running_total (ROWS) |
|---|---|---|---|---|
| 1 | 9.99 | 2024-01-15 | 37.97 | 9.99 |
| 2 | 0.00 | 2024-01-15 | 37.97 | 9.99 |
| 3 | 12.99 | 2024-01-15 | 37.97 | 22.98 |
| 4 | 14.99 | 2024-01-15 | 37.97 | 37.97 |
| 5 | 9.99 | 2024-02-15 | 47.96 | 47.96 |
All four January rows show 37.97 with the default RANGE frame. If you want a row-by-row running total, you need to specify ROWS explicitly:
This same default frame creates an especially confusing result with LAST_VALUE:
You might expect last_val to show the amount from the final row in the result. Instead, it shows the amount from the current row (or the last row in the current RANGE group). The default frame ends at CURRENT ROW, so LAST_VALUE only sees up to the current position.
Fix: Extend the frame to the end of the partition:
Now every row shows the amount from the truly last row in the partition.
String comparisons in SQL have two gotchas that vary by database: trailing spaces and case sensitivity.
| Database | Result | Behavior |
|---|---|---|
| PostgreSQL | FALSE | Compares strings exactly, trailing spaces matter |
| MySQL | TRUE | Pads shorter string with spaces before comparing (PAD SPACE) |
| SQL Server | TRUE | Also pads shorter string (ANSI behavior) |
This means a query like WHERE username = 'alice_m ' (with a trailing space) would find the row in MySQL and SQL Server but not in PostgreSQL. If your data has inconsistent trailing spaces from user input or imports, the results differ across databases.
Fix: Trim before comparing, or clean data on insert:
| Database | Default Result | Why |
|---|---|---|
| PostgreSQL | FALSE | Default collation is case-sensitive |
| MySQL | TRUE | Default collation (utf8mb4_0900_ai_ci) is case-insensitive |
| SQL Server | TRUE | Default collation (SQL_Latin1_General_CP1_CI_AS) is case-insensitive |
The ci in MySQL's collation name stands for "case-insensitive." PostgreSQL treats uppercase and lowercase as distinct characters by default.
Fix for cross-database consistency: Use LOWER() or UPPER() explicitly:
This works the same everywhere, regardless of collation settings.
The SQL standard does not guarantee row order from a subquery unless the outer query specifies ORDER BY. Some databases ignore ORDER BY inside subqueries entirely unless paired with LIMIT or TOP.
PostgreSQL, MySQL, and SQL Server may all return these rows in any order. The inner ORDER BY has no effect on the final result because the outer query doesn't specify an order.
What goes wrong here?
This might work in some databases, but it's not guaranteed. The optimizer can legally strip the inner ORDER BY because the standard doesn't require it to be preserved.
Fix: Put ORDER BY in the outermost query:
If you need ordering within a subquery for a specific reason (like ROW_NUMBER()), the window function handles ordering independently and doesn't rely on the subquery's row order.
When you compare columns of one type with values of another type, the database has to convert one side to match the other. This implicit conversion can produce wrong results or kill performance.
This works in most databases because they convert '3' to an integer. But consider this:
In MySQL, this converts the VARCHAR to a number for comparison. The string 'free' converts to 0, 'premium' converts to 0, and 'family' converts to 0. So WHERE subscription_tier = 0 matches every row where the tier can't be parsed as a number, which is every row.
Even when coercion produces correct results, it can prevent index usage:
Some databases convert '3' to 3 and use the index. Others convert the column to a string for comparison, which disables the index because the database has to evaluate the conversion for every row.
Fix: Always match types explicitly:
Dates have their own coercion quirks:
This works because ISO 8601 format (YYYY-MM-DD) is universally recognized. But non-standard formats depend on database and locale settings:
Fix: Always use ISO 8601 date format (YYYY-MM-DD) or explicit CAST/TO_DATE functions.