SELECT is how you retrieve data from a database. It defines what comes back, in what shape, and under what names. In this chapter we'll look at different ways to retrieve data using SELECT.

Sample Data

To follow along with the examples in this chapter, insert below data in the StreamFlow tables:

SQL

Anatomy of a SELECT Query

A SELECT statement has a predictable structure. At minimum, it needs two clauses: SELECT (what columns) and FROM (which table).

SQL

This pulls two columns from every row in tracks. The full anatomy of a SELECT query includes several optional clauses, each with a specific role:

SQL

Most of these clauses are covered in their own chapters later. This chapter focuses on the SELECT and FROM clauses themselves, plus how all the pieces fit together in terms of execution order.

The diagram shows the clauses in the order you write them. The order the database processes them is different, and that distinction matters. We will get to execution order after covering what SELECT itself can do.

Selecting Columns

Specific Columns

The most common pattern is to list only the columns you actually need.

SQL

This is explicit, readable, and tells the database exactly what data to fetch. When a table has 20 columns but you only need 3, the database can skip reading the rest entirely (depending on storage format and indexing).

SELECT *

The asterisk means "all columns from all tables in FROM":

SQL

This returns every column in tracks: track_id, album_id, title, duration_seconds, track_number, and metadata. It is convenient for exploration, but problematic in production code and interview answers for three reasons:

Schema changes break things silently. If someone adds a column to the table, your query's result set changes shape without any modification to your code. Application code expecting 5 columns suddenly gets 6.
Performance waste. Large columns like metadata (JSONB) get read and transferred even when you only need the track title. The database cannot optimize what it reads if you ask for everything.
Ambiguity in joins. When joining two tables that both have a country column, SELECT * returns both, and the application has to figure out which is which.

When SELECT * Is Acceptable

Two situations where SELECT * is fine:

In EXISTS subqueries, the database never actually reads the columns. It only checks whether at least one row matches:

SQL

The SELECT * inside EXISTS could be SELECT 1 or SELECT s.stream_id, it makes no difference. The database optimizes them all the same way.

In ad-hoc exploration, when you are investigating a table's contents interactively and do not care about efficiency:

SQL

For interview answers and anything resembling production code, it's almost always a good idea to name your columns explicitly.

Column Aliases

Aliases give columns different names in the result set. The keyword AS makes the rename explicit:

SQL

The result columns are now labeled track_title and length_sec instead of the original column names. Aliases are especially useful for computed columns (covered in the next section) where the database would otherwise generate an ugly default name like ?column? or (duration_seconds / 60).

Implicit Aliases

The AS keyword is optional in most databases. These two lines produce the same result:

SQL

The second form works but hurts readability. When you omit AS, it is easy to confuse a missing comma with an alias. Consider this:

SQL

Is that three columns with commas missing, or one column aliased twice? Use AS explicitly to avoid ambiguity.

Quoting Aliases

If your alias needs spaces, reserved words, or mixed case, wrap it in double quotes (PostgreSQL, Oracle) or square brackets (SQL Server):

SQL

Need	PostgreSQL	MySQL	SQL Server
Alias with spaces	`"Track Title"`	`Track Title`	`[Track Title]`
Alias is reserved word	`"order"`	`order`	`[order]`

In practice, keep aliases simple and avoid spaces. Use snake_case (track_title) for programmatic access.

Expressions and Computed Columns

SELECT is not limited to raw column values. You can include expressions that compute new values from existing columns.

Arithmetic Expressions

The most common computed columns involve math on numeric columns:

SQL

Notice the difference between dividing by 60 and 60.0. Integer division truncates: 237 / 60 = 3, not 3.95. Dividing by a decimal forces decimal arithmetic: 237 / 60.0 = 3.95.

You can combine multiple columns in a single expression:

SQL

String Concatenation

Building strings from multiple columns or mixing columns with literal text:

SQL

The || operator is the SQL standard for concatenation and works in PostgreSQL, Oracle, and SQLite. MySQL uses a function instead.

Operation	PostgreSQL	MySQL	SQL Server
Concatenate strings	`'a' \|\| 'b'`	`CONCAT('a', 'b')`	`'a' + 'b'` or `CONCAT('a', 'b')`
NULL behavior	`'a' \|\| NULL` = NULL	`CONCAT('a', NULL)` = `'a'`	`'a' + NULL` = NULL

The NULL behavior row is important. In PostgreSQL, concatenating anything with NULL produces NULL. MySQL's CONCAT function silently ignores NULLs. This means the artist_label expression above would return NULL for Chromatic (whose country is NULL) in PostgreSQL, which is why the query includes a WHERE filter.

A safer cross-dialect approach uses COALESCE:

SQL

Type Casting in Expressions

Sometimes you need to change a value's type within an expression. PostgreSQL uses the :: cast syntax:

SQL

The standard SQL cast syntax works across all databases:

SQL

Cast syntax	PostgreSQL	MySQL	SQL Server
Short form	`value::TYPE`	N/A	N/A
Standard	`CAST(value AS TYPE)`	`CAST(value AS TYPE)`	`CAST(value AS TYPE)`
Function	N/A	`CONVERT(value, TYPE)`	`CONVERT(TYPE, value)`

Use CAST() for portability. Use :: in PostgreSQL-specific code where brevity helps readability.

Combining Multiple Expression Types

A realistic query often mixes arithmetic, concatenation, and casts:

SQL

This query computes duration in minutes (rounded to 1 decimal) and builds a "track X of Y" string using a scalar subquery. Every computed column gets a meaningful alias so the result set is readable.

SQL Execution Order

You write a query in this order: SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT. But the database processes those clauses in a completely different sequence. Understanding this execution order is the single most useful mental model for predicting query behavior, debugging errors, and answering interview questions about why something "should work" but does not.

Loading simulation...

Step 1: FROM

The database starts by identifying which table (or tables, in the case of joins) to read from. This establishes the working set of rows and columns available to all subsequent steps.

Step 2: WHERE

The row filter runs next. It evaluates each row from the FROM step and keeps only those that satisfy the condition. At this point, SELECT has not run yet, so column aliases do not exist. Only raw column names from the source tables are available.

Step 3: GROUP BY

If present, the remaining rows get collapsed into groups. After this step, you can no longer reference individual row values, only group keys and aggregate results.

Step 4: HAVING

A filter on groups (similar to WHERE but applied after aggregation). It can reference aggregate functions because groups already exist at this point.

Step 5: SELECT

Now the database computes the output columns. Aliases are created here. Expressions are evaluated. This is the first moment your computed columns and renamed fields come into existence.

Step 6: ORDER BY

Sorting happens after SELECT, which is why ORDER BY can reference aliases. The alias exists by the time sorting runs.

Step 7: LIMIT

The final step caps the number of rows returned. It runs on the already-sorted result, taking only the first N rows.

Why This Matters

The execution order explains three common confusions:

1. Why aliases fail in WHERE:

SQL

WHERE runs at step 2. The alias minutes is not created until step 5. The database has no idea what minutes refers to when it processes the filter.

2. Why aliases work in ORDER BY:

SQL

ORDER BY runs at step 6, after SELECT (step 5) has created the alias. By the time sorting happens, minutes is a known column in the result set.

3. Why you cannot use aggregate functions in WHERE:

SQL

WHERE (step 2) runs before GROUP BY (step 3). Groups do not exist yet when WHERE evaluates, so there is nothing to count. HAVING (step 4) runs after grouping, so aggregates are available.

A Complete Trace

Here is a query that uses most clauses, traced through execution order:

SQL

Step	Clause	What happens
1	FROM tracks	Start with all 10 rows in `tracks`
2	WHERE duration_seconds > 150	Remove Crackle (156 is > 150, stays). All 10 rows pass since all are > 150 except
3	GROUP BY album_id	Form 5 groups (albums 1-5), each with 2 rows
4	HAVING COUNT(*) >= 2	All 5 groups have exactly 2 tracks, so all pass
5	SELECT	Compute track_count and avg_minutes for each group
6	ORDER BY avg_minutes DESC	Sort by average duration descending
7	LIMIT 3	Return only the top 3 groups

The trace shows that even though SELECT appears first in the query text, it is one of the last things to execute.

MySQL Exception

MySQL allows aliases in HAVING and in some GROUP BY contexts, which violates the standard execution model. This is a MySQL-specific extension. PostgreSQL and SQL Server follow the standard strictly.

Clause	Can use SELECT alias?	Why?
WHERE	No	Runs before SELECT
GROUP BY	No (standard) / Yes (MySQL)	Runs before SELECT
HAVING	No (standard) / Yes (MySQL)	Runs before SELECT
ORDER BY	Yes	Runs after SELECT
LIMIT	N/A (no column references)	Runs last

Quiz

The SELECT Statement Quiz

10 quizzes