AlgoMaster Logo

SQL Gotchas and Tricky Behavior

Last Updated: May 3, 2026

9 min read

SQL looks straightforward until it doesn't. A query runs without errors but returns wrong results, a filter silently drops rows it shouldn't, or an aggregation produces a number that's off by just enough to be dangerous. These aren't bugs in the database. They're defined behaviors that happen to conflict with what most people intuitively expect.

This chapter collects the most common gotchas into one place, each with a "what you'd expect" vs "what actually happens" breakdown and a clear fix.

Sample Data

Insert these rows to follow along:

SQL

NULL Arithmetic and Comparisons

NULL represents an unknown value, not zero, not an empty string, not false. Any arithmetic operation involving NULL produces NULL, and any comparison with NULL produces UNKNOWN rather than TRUE or FALSE.

SQL

For streams where duration_listened is NULL (stream_id 4 and 9), both plus_ten and doubled are also NULL. NULL + 10 doesn't produce 10. It produces NULL, because adding anything to an unknown value still gives an unknown value.

The same logic applies to comparisons. This query returns no rows for NULL countries:

SQL

Nothing comes back, even for Carol and Frank whose country is NULL. The expression NULL = NULL evaluates to UNKNOWN, not TRUE. SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. WHERE only keeps rows where the condition is TRUE, so UNKNOWN gets filtered out just like FALSE.

ExpressionResultWhy
NULL = NULLUNKNOWNCan't know if two unknowns are equal
NULL <> NULLUNKNOWNSame reason
NULL > 5UNKNOWNCan't compare unknown to a value
NULL + 10NULLArithmetic with unknown stays unknown
NULL AND TRUEUNKNOWNUnknown might be FALSE
NULL OR TRUETRUETRUE regardless of the other side
NULL AND FALSEFALSEFALSE regardless of the other side

The correct way to check for NULL is with IS NULL and IS NOT NULL:

SQL

This returns Carol and Frank as expected.

COALESCE for Safe Defaults

When you need NULL to behave like a real value, use COALESCE:

SQL

COALESCE takes the first non-NULL argument. This turns NULL durations into 0 before doing math, so the addition works as expected.

NOT IN with NULLs

This is one of the most dangerous SQL gotchas because the query runs without errors but silently returns wrong results.

Say you want to find artists who are not from the US or GB:

SQL

This returns Echo Chamber (DE). But Pixel Dreams (country is NULL) is missing from the results. Why?

NOT IN is equivalent to a chain of <> comparisons combined with AND. The expression NULL NOT IN ('US', 'GB') expands to NULL <> 'US' AND NULL <> 'GB'. Each comparison with NULL produces UNKNOWN, and UNKNOWN AND UNKNOWN is UNKNOWN. WHERE discards UNKNOWN rows.

The problem gets worse when the subquery itself contains NULL values:

SQL

The users table has NULL values in country (Carol and Frank). This means the subquery returns {'US', 'GB', NULL, 'DE'}. Now every row in artists gets compared against NULL, every NOT IN evaluation includes an UNKNOWN term, and the entire query returns zero rows.

Zero rows. Not "all except US and GB." Zero.

The Fix: Use NOT EXISTS

SQL

NOT EXISTS handles NULLs correctly because it checks for the existence of matching rows rather than comparing values directly. When a.country is NULL, the inner query finds no rows where u.country = NULL is TRUE (because NULL = NULL is UNKNOWN), so NOT EXISTS returns TRUE, correctly including Pixel Dreams.

MethodNULL in list/subqueryNULL in outer columnSafe?
NOT IN (values)Breaks silentlyExcluded silentlyNo
NOT IN (subquery)Returns empty setExcluded silentlyNo
NOT EXISTSHandled correctlyHandled correctlyYes

Use NOT EXISTS as the default when filtering against another table. Reserve NOT IN for hardcoded lists where you're certain no NULLs are involved.

COUNT(*) vs COUNT(column)

These two forms of COUNT look similar but behave differently with NULLs.

SQL

COUNT(*) counts every row regardless of NULLs. It returns 6. COUNT(country) skips rows where country is NULL (Carol and Frank), returning 4. COUNT(DISTINCT country) counts unique non-NULL values: US, GB, DE. That's 3.

This distinction matters when calculating rates and percentages:

SQL

The first query divides 3 by 4 (75%), ignoring the two NULL-country users entirely. The second divides 3 by 6 (50%), which accurately reflects the proportion of all users.

COUNT(1) vs COUNT(*)

You'll sometimes see COUNT(1) instead of COUNT(*). They're equivalent in every major database. The optimizer treats them identically. Use whichever your team prefers, but be consistent.

Integer Division Truncation

Most databases truncate integer division rather than returning a decimal result.

SQL

If 3 out of 10 streams are long, bad_ratio returns 0, not 0.3. Integer divided by integer gives an integer, and 3 / 10 truncates to 0.

There are several ways to fix this:

SQL

All three produce the correct decimal result. The key is making at least one operand non-integer before the division happens.

Database5 / 2 ResultFix
PostgreSQL25.0 / 2 or CAST(5 AS DECIMAL) / 2
MySQL25 / 2.0 or CAST(5 AS DECIMAL) / 2
SQL Server25.0 / 2 or CAST(5 AS DECIMAL(10,2)) / 2

This is consistent across databases. If both sides are integers, the result is an integer.

BETWEEN and Timestamp Boundaries

BETWEEN is inclusive on both ends. For dates, that's usually fine. For timestamps, it creates a subtle trap.

SQL

This only matches streams at exactly midnight (2024-01-15 00:00:00). Every stream that happened after midnight on January 15 is excluded because 2024-01-15 10:00:00 is greater than 2024-01-15 00:00:00 (which is what the bare date string casts to as a timestamp).

The same problem occurs with date ranges:

SQL

This misses every stream on January 31 that occurred after midnight. A stream at 2024-01-31 14:30:00 is greater than 2024-01-31 00:00:00, so it's excluded.

The Fix: Use Half-Open Ranges

SQL

This pattern (>= start, < exclusive end) captures every possible timestamp within January, no matter the time component. It's clearer about the boundary behavior and works correctly with any timestamp precision.

PatternJanuary 31 at 14:30Safe for timestamps?
BETWEEN '2024-01-01' AND '2024-01-31'ExcludedNo
BETWEEN '2024-01-01' AND '2024-01-31 23:59:59'Included, but misses sub-second timesNo
>= '2024-01-01' AND < '2024-02-01'IncludedYes

The half-open range approach is the standard recommendation for timestamp filtering. Avoid BETWEEN with timestamp columns entirely.

Alias Scope and Execution Order

SQL doesn't execute in the order you write it. The logical execution order determines where you can reference what.

SELECT runs after WHERE and GROUP BY. That means aliases defined in SELECT don't exist yet when WHERE and GROUP BY execute.

What goes wrong here?

SQL

stream_count is defined in SELECT, but WHERE runs before SELECT. The alias doesn't exist at that point.

Fix: Use HAVING for post-aggregation filtering, or wrap in a subquery:

SQL

There's one exception to this rule: ORDER BY runs after SELECT, so it can reference aliases.

SQL
ClauseCan use SELECT aliases?Why
WHERENoRuns before SELECT
GROUP BYNo (PostgreSQL), Yes (MySQL)Standard says no
HAVINGNo (standard), Yes (MySQL)Must repeat the expression
ORDER BYYesRuns after SELECT

MySQL is more permissive than the SQL standard here. PostgreSQL and SQL Server follow the standard more strictly. Write queries that work everywhere by not relying on alias references in WHERE, GROUP BY, or HAVING.

GROUP BY and Non-Aggregated Columns

When you use GROUP BY, every column in SELECT must either appear in the GROUP BY clause or be inside an aggregate function. PostgreSQL and SQL Server enforce this strictly. MySQL, by default, does not.

SQL

PostgreSQL rejects this because email is in SELECT but not in GROUP BY or an aggregate. MySQL runs it and picks an arbitrary email value from the group. The result might be correct by coincidence, or it might silently return a wrong email.

This behavior in MySQL is controlled by the ONLY_FULL_GROUP_BY SQL mode. Modern MySQL versions (5.7.5+) enable it by default, so you'll get an error. But older installations or configurations that disable it will silently return unpredictable results.

Fix: Always list every non-aggregated column in GROUP BY:

SQL

If you need a column that's functionally dependent on the GROUP BY key (like email depending on username because username is unique), PostgreSQL allows it when grouping by the primary key. But the safest approach is to be explicit.

UNION vs UNION ALL

UNION removes duplicate rows from the combined result. UNION ALL keeps everything.

SQL

The gotcha is performance. UNION has to sort or hash the entire combined result set to find and remove duplicates. On large tables, this is expensive. If you know the two sets are already disjoint (or if duplicates are acceptable), UNION ALL avoids that overhead entirely.

There's also a correctness gotcha. If the two queries can produce the same row and you need to count occurrences, UNION silently collapses them. For example, if Alice streamed from the US and is also a US user, UNION might drop what looks like a duplicate row.

ScenarioUse
Combining results from different tables, duplicates possible and unwantedUNION
Combining results that are guaranteed disjointUNION ALL
Building a report where each row must be countedUNION ALL
Performance matters and duplicates are acceptableUNION ALL

The default should be UNION ALL unless you specifically need deduplication. Reach for UNION only when you have a clear reason to remove duplicates.

Window Function Frame Defaults

Window functions with ORDER BY use a default frame that catches people off guard.

When you write SUM(amount) OVER (ORDER BY payment_date), the default frame is not "all rows in the partition." It's RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This matters because RANGE groups together rows with the same ORDER BY value.

SQL

Payments 1 through 4 all share the date 2024-01-15. With the default RANGE frame, the running total for all four rows is the same: the sum of all rows with payment_date <= '2024-01-15'. It doesn't accumulate row by row. It jumps to the group total.

payment_idamountpayment_daterunning_total (RANGE, default)running_total (ROWS)
19.992024-01-1537.979.99
20.002024-01-1537.979.99
312.992024-01-1537.9722.98
414.992024-01-1537.9737.97
59.992024-02-1547.9647.96

All four January rows show 37.97 with the default RANGE frame. If you want a row-by-row running total, you need to specify ROWS explicitly:

SQL

The LAST_VALUE Trap

This same default frame creates an especially confusing result with LAST_VALUE:

SQL

You might expect last_val to show the amount from the final row in the result. Instead, it shows the amount from the current row (or the last row in the current RANGE group). The default frame ends at CURRENT ROW, so LAST_VALUE only sees up to the current position.

Fix: Extend the frame to the end of the partition:

SQL

Now every row shows the amount from the truly last row in the partition.

String Comparison Surprises

String comparisons in SQL have two gotchas that vary by database: trailing spaces and case sensitivity.

Trailing Spaces

SQL
DatabaseResultBehavior
PostgreSQLFALSECompares strings exactly, trailing spaces matter
MySQLTRUEPads shorter string with spaces before comparing (PAD SPACE)
SQL ServerTRUEAlso pads shorter string (ANSI behavior)

This means a query like WHERE username = 'alice_m ' (with a trailing space) would find the row in MySQL and SQL Server but not in PostgreSQL. If your data has inconsistent trailing spaces from user input or imports, the results differ across databases.

Fix: Trim before comparing, or clean data on insert:

SQL

Case Sensitivity

SQL
DatabaseDefault ResultWhy
PostgreSQLFALSEDefault collation is case-sensitive
MySQLTRUEDefault collation (utf8mb4_0900_ai_ci) is case-insensitive
SQL ServerTRUEDefault collation (SQL_Latin1_General_CP1_CI_AS) is case-insensitive

The ci in MySQL's collation name stands for "case-insensitive." PostgreSQL treats uppercase and lowercase as distinct characters by default.

Fix for cross-database consistency: Use LOWER() or UPPER() explicitly:

SQL

This works the same everywhere, regardless of collation settings.

ORDER BY in Subqueries

The SQL standard does not guarantee row order from a subquery unless the outer query specifies ORDER BY. Some databases ignore ORDER BY inside subqueries entirely unless paired with LIMIT or TOP.

SQL

PostgreSQL, MySQL, and SQL Server may all return these rows in any order. The inner ORDER BY has no effect on the final result because the outer query doesn't specify an order.

What goes wrong here?

SQL

This might work in some databases, but it's not guaranteed. The optimizer can legally strip the inner ORDER BY because the standard doesn't require it to be preserved.

Fix: Put ORDER BY in the outermost query:

SQL

If you need ordering within a subquery for a specific reason (like ROW_NUMBER()), the window function handles ordering independently and doesn't rely on the subquery's row order.

Implicit Type Coercion

When you compare columns of one type with values of another type, the database has to convert one side to match the other. This implicit conversion can produce wrong results or kill performance.

SQL

This works in most databases because they convert '3' to an integer. But consider this:

SQL

In MySQL, this converts the VARCHAR to a number for comparison. The string 'free' converts to 0, 'premium' converts to 0, and 'family' converts to 0. So WHERE subscription_tier = 0 matches every row where the tier can't be parsed as a number, which is every row.

Performance Impact

Even when coercion produces correct results, it can prevent index usage:

SQL

Some databases convert '3' to 3 and use the index. Others convert the column to a string for comparison, which disables the index because the database has to evaluate the conversion for every row.

Fix: Always match types explicitly:

SQL

Date String Comparisons

Dates have their own coercion quirks:

SQL

This works because ISO 8601 format (YYYY-MM-DD) is universally recognized. But non-standard formats depend on database and locale settings:

SQL

Fix: Always use ISO 8601 date format (YYYY-MM-DD) or explicit CAST/TO_DATE functions.