Last Updated: May 2, 2026
SELECT is how you retrieve data from a database. It defines what comes back, in what shape, and under what names. In this chapter we'll look at different ways to retrieve data using SELECT.
To follow along with the examples in this chapter, insert below data in the StreamFlow tables:
A SELECT statement has a predictable structure. At minimum, it needs two clauses: SELECT (what columns) and FROM (which table).
This pulls two columns from every row in tracks. The full anatomy of a SELECT query includes several optional clauses, each with a specific role:
Most of these clauses are covered in their own chapters later. This chapter focuses on the SELECT and FROM clauses themselves, plus how all the pieces fit together in terms of execution order.
The diagram shows the clauses in the order you write them. The order the database processes them is different, and that distinction matters. We will get to execution order after covering what SELECT itself can do.
The most common pattern is to list only the columns you actually need.
This is explicit, readable, and tells the database exactly what data to fetch. When a table has 20 columns but you only need 3, the database can skip reading the rest entirely (depending on storage format and indexing).
The asterisk means "all columns from all tables in FROM":
This returns every column in tracks: track_id, album_id, title, duration_seconds, track_number, and metadata. It is convenient for exploration, but problematic in production code and interview answers for three reasons:
metadata (JSONB) get read and transferred even when you only need the track title. The database cannot optimize what it reads if you ask for everything.country column, SELECT * returns both, and the application has to figure out which is which.Two situations where SELECT * is fine:
The SELECT * inside EXISTS could be SELECT 1 or SELECT s.stream_id, it makes no difference. The database optimizes them all the same way.
For interview answers and anything resembling production code, it's almost always a good idea to name your columns explicitly.
Aliases give columns different names in the result set. The keyword AS makes the rename explicit:
The result columns are now labeled track_title and length_sec instead of the original column names. Aliases are especially useful for computed columns (covered in the next section) where the database would otherwise generate an ugly default name like ?column? or (duration_seconds / 60).
The AS keyword is optional in most databases. These two lines produce the same result:
The second form works but hurts readability. When you omit AS, it is easy to confuse a missing comma with an alias. Consider this:
Is that three columns with commas missing, or one column aliased twice? Use AS explicitly to avoid ambiguity.
If your alias needs spaces, reserved words, or mixed case, wrap it in double quotes (PostgreSQL, Oracle) or square brackets (SQL Server):
| Need | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Alias with spaces | "Track Title" | `Track Title` | [Track Title] |
| Alias is reserved word | "order" | `order` | [order] |
In practice, keep aliases simple and avoid spaces. Use snake_case (track_title) for programmatic access.
SELECT is not limited to raw column values. You can include expressions that compute new values from existing columns.
The most common computed columns involve math on numeric columns:
Notice the difference between dividing by 60 and 60.0. Integer division truncates: 237 / 60 = 3, not 3.95. Dividing by a decimal forces decimal arithmetic: 237 / 60.0 = 3.95.
You can combine multiple columns in a single expression:
Building strings from multiple columns or mixing columns with literal text:
The || operator is the SQL standard for concatenation and works in PostgreSQL, Oracle, and SQLite. MySQL uses a function instead.
| Operation | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Concatenate strings | 'a' || 'b' | CONCAT('a', 'b') | 'a' + 'b' or CONCAT('a', 'b') |
| NULL behavior | 'a' || NULL = NULL | CONCAT('a', NULL) = 'a' | 'a' + NULL = NULL |
The NULL behavior row is important. In PostgreSQL, concatenating anything with NULL produces NULL. MySQL's CONCAT function silently ignores NULLs. This means the artist_label expression above would return NULL for Chromatic (whose country is NULL) in PostgreSQL, which is why the query includes a WHERE filter.
A safer cross-dialect approach uses COALESCE:
Sometimes you need to change a value's type within an expression. PostgreSQL uses the :: cast syntax:
The standard SQL cast syntax works across all databases:
| Cast syntax | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Short form | value::TYPE | N/A | N/A |
| Standard | CAST(value AS TYPE) | CAST(value AS TYPE) | CAST(value AS TYPE) |
| Function | N/A | CONVERT(value, TYPE) | CONVERT(TYPE, value) |
Use CAST() for portability. Use :: in PostgreSQL-specific code where brevity helps readability.
A realistic query often mixes arithmetic, concatenation, and casts:
This query computes duration in minutes (rounded to 1 decimal) and builds a "track X of Y" string using a scalar subquery. Every computed column gets a meaningful alias so the result set is readable.
You write a query in this order: SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT. But the database processes those clauses in a completely different sequence. Understanding this execution order is the single most useful mental model for predicting query behavior, debugging errors, and answering interview questions about why something "should work" but does not.
The database starts by identifying which table (or tables, in the case of joins) to read from. This establishes the working set of rows and columns available to all subsequent steps.
The row filter runs next. It evaluates each row from the FROM step and keeps only those that satisfy the condition. At this point, SELECT has not run yet, so column aliases do not exist. Only raw column names from the source tables are available.
If present, the remaining rows get collapsed into groups. After this step, you can no longer reference individual row values, only group keys and aggregate results.
A filter on groups (similar to WHERE but applied after aggregation). It can reference aggregate functions because groups already exist at this point.
Now the database computes the output columns. Aliases are created here. Expressions are evaluated. This is the first moment your computed columns and renamed fields come into existence.
Sorting happens after SELECT, which is why ORDER BY can reference aliases. The alias exists by the time sorting runs.
The final step caps the number of rows returned. It runs on the already-sorted result, taking only the first N rows.
The execution order explains three common confusions:
1. Why aliases fail in WHERE:
WHERE runs at step 2. The alias minutes is not created until step 5. The database has no idea what minutes refers to when it processes the filter.
2. Why aliases work in ORDER BY:
ORDER BY runs at step 6, after SELECT (step 5) has created the alias. By the time sorting happens, minutes is a known column in the result set.
3. Why you cannot use aggregate functions in WHERE:
WHERE (step 2) runs before GROUP BY (step 3). Groups do not exist yet when WHERE evaluates, so there is nothing to count. HAVING (step 4) runs after grouping, so aggregates are available.
Here is a query that uses most clauses, traced through execution order:
| Step | Clause | What happens |
|---|---|---|
| 1 | FROM tracks | Start with all 10 rows in tracks |
| 2 | WHERE duration_seconds > 150 | Remove Crackle (156 is > 150, stays). All 10 rows pass since all are > 150 except |
| 3 | GROUP BY album_id | Form 5 groups (albums 1-5), each with 2 rows |
| 4 | HAVING COUNT(*) >= 2 | All 5 groups have exactly 2 tracks, so all pass |
| 5 | SELECT | Compute track_count and avg_minutes for each group |
| 6 | ORDER BY avg_minutes DESC | Sort by average duration descending |
| 7 | LIMIT 3 | Return only the top 3 groups |
The trace shows that even though SELECT appears first in the query text, it is one of the last things to execute.
MySQL allows aliases in HAVING and in some GROUP BY contexts, which violates the standard execution model. This is a MySQL-specific extension. PostgreSQL and SQL Server follow the standard strictly.
| Clause | Can use SELECT alias? | Why? |
|---|---|---|
| WHERE | No | Runs before SELECT |
| GROUP BY | No (standard) / Yes (MySQL) | Runs before SELECT |
| HAVING | No (standard) / Yes (MySQL) | Runs before SELECT |
| ORDER BY | Yes | Runs after SELECT |
| LIMIT | N/A (no column references) | Runs last |