Last Updated: May 3, 2026
Every query in this course so far has assumed the tables already exist. Someone had to define what columns the users table has, what types those columns hold, and what rules the data must follow.
That's what DDL (Data Definition Language) does. This chapter covers how to create tables, choose data types, set up constraints, and modify tables after they're in production.
The examples in this chapter create new tables and modify existing ones.
These tables already exist:
Insert below data to follow along:
Note: This chapter modifies data, so you may want to re-run these statements if you need a clean starting point.
The basic syntax defines a table name, a list of columns, and their types:
Each column needs a name and a data type at minimum. Constraints are optional but almost always present in real schemas. Here's a simple example that creates a table to track user listening sessions:
This defines five columns. session_id is an auto-incrementing integer primary key. user_id and started_at are required (NOT NULL). ended_at can be NULL because the session might still be active. total_tracks defaults to 0 if not provided.
Running CREATE TABLE on a table that already exists produces an error:
Adding IF NOT EXISTS skips the creation quietly when the table already exists:
This is useful in migration scripts and setup scripts that might run more than once. Note that IF NOT EXISTS does not check whether the existing table has the same columns or types. It only checks the name.
Constraints can be declared in two places. Inline constraints go on the column definition:
Table-level constraints go after all column definitions. You need this style for composite keys and multi-column constraints:
The composite primary key (playlist_id, track_id) can't be expressed inline because it spans two columns. The same applies to multi-column UNIQUE constraints and CHECK constraints that reference multiple columns.
Both styles are equivalent for single-column constraints. Use inline for readability when the constraint applies to one column, and table-level when it spans multiple columns.
Choosing the right data type matters more than it seems. A bad choice wastes storage, hurts performance, or fails to validate data properly. Here are the types you'll use most often.
| Type | Size | Range | Use For |
|---|---|---|---|
| SMALLINT | 2 bytes | -32,768 to 32,767 | Track numbers, small counts |
| INT (INTEGER) | 4 bytes | -2.1 billion to 2.1 billion | Most IDs, counts |
| BIGINT | 8 bytes | -9.2 quintillion to 9.2 quintillion | High-volume IDs (streams, impressions) |
| SERIAL | 4 bytes (auto) | Same as INT | Auto-incrementing PKs |
| BIGSERIAL | 8 bytes (auto) | Same as BIGINT | Auto-incrementing PKs for high-volume tables |
StreamFlow uses SERIAL for low-volume tables (users, artists, genres) and BIGSERIAL for high-volume ones (streams, ad_impressions). The deciding factor is whether the table could approach 2.1 billion rows over its lifetime. A music streaming platform can generate billions of stream events, so streams.stream_id uses BIGSERIAL.
| Type | Precision | Use For |
|---|---|---|
| DECIMAL(p, s) / NUMERIC(p, s) | Exact, user-defined | Money, financial calculations |
| REAL | 6 decimal digits | Approximate scientific values |
| DOUBLE PRECISION | 15 decimal digits | Approximate scientific values |
For money, always use DECIMAL or NUMERIC with fixed precision. StreamFlow's payments.amount is DECIMAL(8,2), meaning up to 999,999.99. Floating-point types like REAL introduce rounding errors that are unacceptable for financial data:
| Type | Behavior | Use For |
|---|---|---|
| VARCHAR(n) | Variable-length, max n characters | Bounded strings (usernames, emails, country codes) |
| CHAR(n) | Fixed-length, padded with spaces | Fixed-format codes (rarely used) |
| TEXT | Variable-length, no limit | Unbounded text (descriptions, bios) |
In PostgreSQL, there's no performance difference between VARCHAR(n) and TEXT. VARCHAR(n) acts as a validation constraint, rejecting values longer than n characters. TEXT accepts any length.
StreamFlow uses VARCHAR with specific limits: username VARCHAR(50), email VARCHAR(100), country VARCHAR(2). These limits serve as data validation. A country code should never be longer than 2 characters. If it is, something is wrong with the input, and the database should reject it.
When to use TEXT: for columns where you genuinely don't know the maximum length, like user bios, descriptions, or free-form notes. When to use VARCHAR(n): for columns where there's a natural upper bound and you want the database to enforce it.
CHAR(n) pads shorter values with trailing spaces, which causes subtle bugs with string comparisons. It's almost never the right choice.
| Operation | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Unbounded string | TEXT | TEXT or LONGTEXT | VARCHAR(MAX) or NVARCHAR(MAX) |
| Max VARCHAR length | VARCHAR(10485760) | VARCHAR(65535) | VARCHAR(8000) or VARCHAR(MAX) |
| Type | Stores | Example |
|---|---|---|
| DATE | Date only | '2024-03-15' |
| TIME | Time only | '14:30:00' |
| TIMESTAMP | Date + time | '2024-03-15 14:30:00' |
| TIMESTAMP WITH TIME ZONE | Date + time + timezone | '2024-03-15 14:30:00+00' |
| INTERVAL | Duration | '3 hours 30 minutes' |
StreamFlow uses DATE for values that don't need time precision (signup_date, release_date, payment_date) and TIMESTAMP WITH TIME ZONE for events where the exact moment matters (streams.started_at, playlists.created_at).
A common mistake is using TIMESTAMP (without time zone) for event data. If your users are in different time zones, a bare TIMESTAMP is ambiguous. Was "2024-03-15 14:30:00" in UTC, EST, or JST? Use TIMESTAMP WITH TIME ZONE and store everything in UTC.
PostgreSQL has a native BOOLEAN type that stores TRUE, FALSE, or NULL:
MySQL doesn't have a true BOOLEAN type. BOOLEAN is an alias for TINYINT(1), where 0 is false and 1 is true. SQL Server uses BIT with values 0 and 1.
PostgreSQL offers two JSON types:
| Type | Storage | Indexing | Use For |
|---|---|---|---|
| JSON | Raw text | No GIN index support | Rare, mostly for exact preservation of formatting |
| JSONB | Binary, parsed | GIN indexes | Almost always the right choice |
StreamFlow stores flexible metadata in JSONB columns: tracks.metadata holds BPM, mood tags, and other attributes that vary by track. streams.device_info captures device type, OS, and app version.
JSONB is better than JSON in nearly every case. It's faster to query, supports indexing, and removes duplicate keys. The only reason to use JSON is if you need to preserve the exact formatting or key ordering of the original input, which is rare.
UUID (Universally Unique Identifier) generates 128-bit identifiers like 550e8400-e29b-41d4-a716-446655440000. PostgreSQL has a native UUID type:
UUIDs are useful when you need IDs that are unique across systems (distributed databases, APIs, public-facing identifiers). The trade-off is size: UUID takes 16 bytes vs 4 bytes for INT. For internal tables with auto-incrementing keys, SERIAL or BIGSERIAL is more efficient. For public-facing IDs or distributed systems, UUID avoids the coordination problem of sequential integers.
A few rules of thumb:
Single column:
Composite (table-level):
A foreign key links one table to another. The basic syntax references the parent table and column:
But what happens when you delete an artist who has albums? By default, the database blocks the deletion:
The ON DELETE clause controls this behavior. There are four options:
| Action | Behavior | Use When |
|---|---|---|
| RESTRICT (default) | Block the delete if child rows exist | You want to prevent accidental data loss |
| CASCADE | Delete child rows automatically | Child data has no meaning without the parent |
| SET NULL | Set the FK column to NULL in child rows | The relationship is optional |
| SET DEFAULT | Set the FK column to its DEFAULT value | Rare, requires a valid default |
Here's CASCADE in action. If you delete a playlist, its track associations are automatically removed:
This table uses different actions for its two foreign keys. Deleting a playlist cascades (removes all associations), but deleting a track is blocked if it's in any playlist. The choice depends on the business logic: playlist membership is disposable, but you probably don't want to silently remove a track from every playlist it appears in.
SET NULL works for optional relationships:
If an artist is deleted, their albums stay in the database with artist_id set to NULL. This makes sense for "orphaned" data you want to keep. Notice that artist_id cannot be NOT NULL when using SET NULL, otherwise the database can't set it to NULL on delete.
ON UPDATE follows the same pattern. ON UPDATE CASCADE propagates changes to the parent key down to child rows. This is mostly relevant when using natural keys that might change. With surrogate keys (auto-incrementing integers), the parent key never changes, so ON UPDATE is less important.
A table can have multiple UNIQUE constraints. Multi-column uniqueness uses table-level syntax:
This ensures a user can follow an artist only once. The pair (user_id, artist_id) must be unique, but each individual column can repeat.
CHECK constraints validate column values against a condition:
Multi-column checks use table-level syntax:
This ensures that if an end date exists, it comes after the start date. CHECK constraints can reference multiple columns in the same row, but they cannot reference other tables.
DEFAULT provides a value when one isn't specified during INSERT:
DEFAULT values can be constants ('free'), function calls (NOW()), or expressions (gen_random_uuid()). They only apply when the column is omitted from the INSERT. Explicitly inserting NULL overrides the default and stores NULL (unless the column also has a NOT NULL constraint).
Tables aren't static. Requirements change, new features need new columns, and mistakes need fixing. ALTER TABLE modifies an existing table without dropping and recreating it.
The new column is added to all existing rows with the default value (FALSE in this case). If no default is specified, existing rows get NULL.
Adding a NOT NULL column to a table that already has data requires a default:
Without a default, the database can't fill in existing rows and the command fails:
If other objects depend on the column (views, indexes, constraints), you need CASCADE:
This drops the column and any dependent objects. Be careful with CASCADE on DROP COLUMN. It might drop a view you didn't know existed.
This does not affect the data. Existing values stay the same. You'd need a separate UPDATE to convert seconds to milliseconds.
PostgreSQL can convert between compatible types automatically (INT to BIGINT, VARCHAR(50) to VARCHAR(100)). Incompatible conversions need a USING clause:
The USING clause tells PostgreSQL how to convert existing values. Without it, the database tries an implicit cast, which fails for many type changes.
What goes wrong here?
Fix: This conversion doesn't make sense semantically, but if you did need to change a column's type and the cast isn't automatic, provide a USING clause with the conversion expression.
Add a CHECK constraint:
Add a foreign key:
Drop a constraint by name:
This is why naming constraints matters. If you don't name them, PostgreSQL auto-generates names like tracks_duration_seconds_check, which are harder to reference later.
SET NOT NULL fails if the column contains any NULL values. You need to fill them in first:
Foreign keys and indexes that reference the table are updated automatically. Views and application code that reference the old name will break.
Three different ways to remove data, each with different scope and behavior.
Removes the table entirely: data, structure, indexes, constraints, everything.
If other tables have foreign keys pointing to this table, the drop fails:
CASCADE drops the table and all dependent foreign key constraints (not the dependent tables themselves):
RESTRICT (the default) blocks the drop when dependencies exist. It's the same as omitting CASCADE.
Like CREATE TABLE, DROP TABLE supports IF EXISTS to avoid errors when the table doesn't exist:
Removes all rows but keeps the table structure, indexes, and constraints:
TRUNCATE is much faster than DELETE for removing all rows because it doesn't log individual row deletions. It also resets SERIAL/BIGSERIAL sequences by default in PostgreSQL.
If the table has child rows in other tables, TRUNCATE fails unless you use CASCADE:
| Aspect | DROP TABLE | TRUNCATE | DELETE |
|---|---|---|---|
| Removes data | Yes | Yes | Yes (with WHERE) |
| Removes structure | Yes | No | No |
| Can filter rows | No | No | Yes |
| Resets sequences | N/A | Yes (PostgreSQL) | No |
| Fires triggers | No | No | Yes |
| Can be rolled back | Depends on DB | Depends on DB | Yes |
| Speed (large tables) | Fast | Fast | Slow (row-by-row) |
Use DELETE when you want to remove specific rows. Use TRUNCATE when you want to empty a table fast. Use DROP when you want to remove the table entirely.
CTAS creates a new table from the result of a query. The new table's columns match the query's output columns:
This creates a top_artists table with four columns, populated with the query results. It's useful for creating summary tables, snapshots, or staging data for analysis.
There's a critical gotcha: CTAS copies data and column types, but not constraints. The new table has no primary key, no foreign keys, no NOT NULL constraints, no defaults, and no indexes. You need to add them manually:
CTAS also supports IF NOT EXISTS:
The query in a CTAS can be as complex as any SELECT: joins, aggregations, window functions, CTEs. The result is materialized into a real table.
| Operation | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| CTAS | CREATE TABLE ... AS SELECT | CREATE TABLE ... AS SELECT | SELECT ... INTO new_table FROM ... |
Stored procedures and functions are blocks of reusable SQL logic that live on the database server. Interviewers rarely deep-dive on these for individual contributor roles, but you should know what they are.
A function takes inputs, runs logic, and returns a result. PostgreSQL uses CREATE FUNCTION:
A stored procedure is similar but doesn't return a value directly. It performs actions like inserting, updating, or deleting data. PostgreSQL added procedures in version 11 with CREATE PROCEDURE:
Common use cases for stored procedures and functions: complex business logic that needs to run close to the data, data migrations, scheduled cleanup jobs, and encapsulating multi-step operations into a single call.
| Feature | Function | Procedure |
|---|---|---|
| Returns a value | Yes | No (uses OUT parameters) |
| Can be used in SELECT | Yes | No |
| Can manage transactions | No | Yes (COMMIT/ROLLBACK inside) |
| Called with | SELECT function() | CALL procedure() |
A trigger is a function that runs automatically when a specific event happens on a table: INSERT, UPDATE, or DELETE. They're useful for audit logging, enforcing complex business rules, and keeping derived data in sync.
Here's a trigger that logs every change to the artists table into an audit table:
After this trigger is in place, every INSERT, UPDATE, or DELETE on artists automatically creates a row in artist_audit_log with the artist ID and the operation type.
Common trigger use cases:
Triggers can make debugging harder because they run implicitly. A simple UPDATE might cascade into multiple trigger executions across several tables. Use them for cross-cutting concerns like audit logging, not for core business logic.
| Timing | Fires | Use For |
|---|---|---|
| BEFORE | Before the row change is applied | Validation, modifying the incoming data |
| AFTER | After the row change is applied | Audit logging, notifications, syncing derived data |
| INSTEAD OF | Replaces the original operation | Making views updatable |