Every query in this course so far has assumed the tables already exist. Someone had to define what columns the users table has, what types those columns hold, and what rules the data must follow.

That's what DDL (Data Definition Language) does. This chapter covers how to create tables, choose data types, set up constraints, and modify tables after they're in production.

Sample Data

The examples in this chapter create new tables and modify existing ones.

These tables already exist:

SQL

Insert below data to follow along:

SQL

Note: This chapter modifies data, so you may want to re-run these statements if you need a clean starting point.

CREATE TABLE

The basic syntax defines a table name, a list of columns, and their types:

SQL

Each column needs a name and a data type at minimum. Constraints are optional but almost always present in real schemas. Here's a simple example that creates a table to track user listening sessions:

SQL

This defines five columns. session_id is an auto-incrementing integer primary key. user_id and started_at are required (NOT NULL). ended_at can be NULL because the session might still be active. total_tracks defaults to 0 if not provided.

IF NOT EXISTS

Running CREATE TABLE on a table that already exists produces an error:

SQL

Adding IF NOT EXISTS skips the creation quietly when the table already exists:

SQL

This is useful in migration scripts and setup scripts that might run more than once. Note that IF NOT EXISTS does not check whether the existing table has the same columns or types. It only checks the name.

Inline vs Table-Level Constraints

Constraints can be declared in two places. Inline constraints go on the column definition:

SQL

Table-level constraints go after all column definitions. You need this style for composite keys and multi-column constraints:

SQL

The composite primary key (playlist_id, track_id) can't be expressed inline because it spans two columns. The same applies to multi-column UNIQUE constraints and CHECK constraints that reference multiple columns.

Both styles are equivalent for single-column constraints. Use inline for readability when the constraint applies to one column, and table-level when it spans multiple columns.

Data Types

Choosing the right data type matters more than it seems. A bad choice wastes storage, hurts performance, or fails to validate data properly. Here are the types you'll use most often.

Integers

Type	Size	Range	Use For
SMALLINT	2 bytes	-32,768 to 32,767	Track numbers, small counts
INT (INTEGER)	4 bytes	-2.1 billion to 2.1 billion	Most IDs, counts
BIGINT	8 bytes	-9.2 quintillion to 9.2 quintillion	High-volume IDs (streams, impressions)
SERIAL	4 bytes (auto)	Same as INT	Auto-incrementing PKs
BIGSERIAL	8 bytes (auto)	Same as BIGINT	Auto-incrementing PKs for high-volume tables

StreamFlow uses SERIAL for low-volume tables (users, artists, genres) and BIGSERIAL for high-volume ones (streams, ad_impressions). The deciding factor is whether the table could approach 2.1 billion rows over its lifetime. A music streaming platform can generate billions of stream events, so streams.stream_id uses BIGSERIAL.

Decimals

Type	Precision	Use For
DECIMAL(p, s) / NUMERIC(p, s)	Exact, user-defined	Money, financial calculations
REAL	6 decimal digits	Approximate scientific values
DOUBLE PRECISION	15 decimal digits	Approximate scientific values

For money, always use DECIMAL or NUMERIC with fixed precision. StreamFlow's payments.amount is DECIMAL(8,2), meaning up to 999,999.99. Floating-point types like REAL introduce rounding errors that are unacceptable for financial data:

SQL

Strings

Type	Behavior	Use For
VARCHAR(n)	Variable-length, max n characters	Bounded strings (usernames, emails, country codes)
CHAR(n)	Fixed-length, padded with spaces	Fixed-format codes (rarely used)
TEXT	Variable-length, no limit	Unbounded text (descriptions, bios)

In PostgreSQL, there's no performance difference between VARCHAR(n) and TEXT. VARCHAR(n) acts as a validation constraint, rejecting values longer than n characters. TEXT accepts any length.

StreamFlow uses VARCHAR with specific limits: username VARCHAR(50), email VARCHAR(100), country VARCHAR(2). These limits serve as data validation. A country code should never be longer than 2 characters. If it is, something is wrong with the input, and the database should reject it.

When to use TEXT: for columns where you genuinely don't know the maximum length, like user bios, descriptions, or free-form notes. When to use VARCHAR(n): for columns where there's a natural upper bound and you want the database to enforce it.

CHAR(n) pads shorter values with trailing spaces, which causes subtle bugs with string comparisons. It's almost never the right choice.

Operation	PostgreSQL	MySQL	SQL Server
Unbounded string	TEXT	TEXT or LONGTEXT	VARCHAR(MAX) or NVARCHAR(MAX)
Max VARCHAR length	VARCHAR(10485760)	VARCHAR(65535)	VARCHAR(8000) or VARCHAR(MAX)

Dates and Times

Type	Stores	Example
DATE	Date only	'2024-03-15'
TIME	Time only	'14:30:00'
TIMESTAMP	Date + time	'2024-03-15 14:30:00'
TIMESTAMP WITH TIME ZONE	Date + time + timezone	'2024-03-15 14:30:00+00'
INTERVAL	Duration	'3 hours 30 minutes'

StreamFlow uses DATE for values that don't need time precision (signup_date, release_date, payment_date) and TIMESTAMP WITH TIME ZONE for events where the exact moment matters (streams.started_at, playlists.created_at).

A common mistake is using TIMESTAMP (without time zone) for event data. If your users are in different time zones, a bare TIMESTAMP is ambiguous. Was "2024-03-15 14:30:00" in UTC, EST, or JST? Use TIMESTAMP WITH TIME ZONE and store everything in UTC.

Booleans

PostgreSQL has a native BOOLEAN type that stores TRUE, FALSE, or NULL:

SQL

MySQL doesn't have a true BOOLEAN type. BOOLEAN is an alias for TINYINT(1), where 0 is false and 1 is true. SQL Server uses BIT with values 0 and 1.

JSON and JSONB

PostgreSQL offers two JSON types:

Type	Storage	Indexing	Use For
JSON	Raw text	No GIN index support	Rare, mostly for exact preservation of formatting
JSONB	Binary, parsed	GIN indexes	Almost always the right choice

StreamFlow stores flexible metadata in JSONB columns: tracks.metadata holds BPM, mood tags, and other attributes that vary by track. streams.device_info captures device type, OS, and app version.

JSONB is better than JSON in nearly every case. It's faster to query, supports indexing, and removes duplicate keys. The only reason to use JSON is if you need to preserve the exact formatting or key ordering of the original input, which is rare.

UUID

UUID (Universally Unique Identifier) generates 128-bit identifiers like 550e8400-e29b-41d4-a716-446655440000. PostgreSQL has a native UUID type:

SQL

UUIDs are useful when you need IDs that are unique across systems (distributed databases, APIs, public-facing identifiers). The trade-off is size: UUID takes 16 bytes vs 4 bytes for INT. For internal tables with auto-incrementing keys, SERIAL or BIGSERIAL is more efficient. For public-facing IDs or distributed systems, UUID avoids the coordination problem of sequential integers.

Choosing the Right Data Type

A few rules of thumb:

Use the smallest integer type that fits your range. Don't default to BIGINT for a column that holds track numbers 1-20.
Use DECIMAL for money. Never REAL or DOUBLE PRECISION.
Use VARCHAR(n) when you have a known upper bound. Use TEXT when you don't.
Use TIMESTAMP WITH TIME ZONE for event times. Use DATE for calendar dates.
Use JSONB for semi-structured data that varies per row. If the structure is consistent, normalize it into columns instead.
Use UUID for public-facing or cross-system IDs. Use SERIAL/BIGSERIAL for internal auto-incrementing IDs.

Constraints in DDL

PRIMARY KEY

Single column:

SQL

Composite (table-level):

SQL

FOREIGN KEY with ON DELETE and ON UPDATE

A foreign key links one table to another. The basic syntax references the parent table and column:

SQL

But what happens when you delete an artist who has albums? By default, the database blocks the deletion:

SQL

The ON DELETE clause controls this behavior. There are four options:

Action	Behavior	Use When
RESTRICT (default)	Block the delete if child rows exist	You want to prevent accidental data loss
CASCADE	Delete child rows automatically	Child data has no meaning without the parent
SET NULL	Set the FK column to NULL in child rows	The relationship is optional
SET DEFAULT	Set the FK column to its DEFAULT value	Rare, requires a valid default

Here's CASCADE in action. If you delete a playlist, its track associations are automatically removed:

SQL

This table uses different actions for its two foreign keys. Deleting a playlist cascades (removes all associations), but deleting a track is blocked if it's in any playlist. The choice depends on the business logic: playlist membership is disposable, but you probably don't want to silently remove a track from every playlist it appears in.

SET NULL works for optional relationships:

SQL

If an artist is deleted, their albums stay in the database with artist_id set to NULL. This makes sense for "orphaned" data you want to keep. Notice that artist_id cannot be NOT NULL when using SET NULL, otherwise the database can't set it to NULL on delete.

ON UPDATE follows the same pattern. ON UPDATE CASCADE propagates changes to the parent key down to child rows. This is mostly relevant when using natural keys that might change. With surrogate keys (auto-incrementing integers), the parent key never changes, so ON UPDATE is less important.

UNIQUE

SQL

A table can have multiple UNIQUE constraints. Multi-column uniqueness uses table-level syntax:

SQL

This ensures a user can follow an artist only once. The pair (user_id, artist_id) must be unique, but each individual column can repeat.

CHECK

CHECK constraints validate column values against a condition:

SQL

Multi-column checks use table-level syntax:

SQL

This ensures that if an end date exists, it comes after the start date. CHECK constraints can reference multiple columns in the same row, but they cannot reference other tables.

DEFAULT

DEFAULT provides a value when one isn't specified during INSERT:

SQL

DEFAULT values can be constants ('free'), function calls (NOW()), or expressions (gen_random_uuid()). They only apply when the column is omitted from the INSERT. Explicitly inserting NULL overrides the default and stores NULL (unless the column also has a NOT NULL constraint).

ALTER TABLE

Tables aren't static. Requirements change, new features need new columns, and mistakes need fixing. ALTER TABLE modifies an existing table without dropping and recreating it.

Adding Columns

SQL

The new column is added to all existing rows with the default value (FALSE in this case). If no default is specified, existing rows get NULL.

Adding a NOT NULL column to a table that already has data requires a default:

SQL

Without a default, the database can't fill in existing rows and the command fails:

SQL

Dropping Columns

SQL

If other objects depend on the column (views, indexes, constraints), you need CASCADE:

SQL

This drops the column and any dependent objects. Be careful with CASCADE on DROP COLUMN. It might drop a view you didn't know existed.

Renaming Columns

SQL

This does not affect the data. Existing values stay the same. You'd need a separate UPDATE to convert seconds to milliseconds.

Changing Data Types

SQL

PostgreSQL can convert between compatible types automatically (INT to BIGINT, VARCHAR(50) to VARCHAR(100)). Incompatible conversions need a USING clause:

SQL

The USING clause tells PostgreSQL how to convert existing values. Without it, the database tries an implicit cast, which fails for many type changes.

What goes wrong here?

SQL

Fix: This conversion doesn't make sense semantically, but if you did need to change a column's type and the cast isn't automatic, provide a USING clause with the conversion expression.

Adding and Dropping Constraints

Add a CHECK constraint:

SQL

Add a foreign key:

SQL

Drop a constraint by name:

SQL

This is why naming constraints matters. If you don't name them, PostgreSQL auto-generates names like tracks_duration_seconds_check, which are harder to reference later.

Setting and Dropping Defaults and NOT NULL

SQL

SET NOT NULL fails if the column contains any NULL values. You need to fill them in first:

SQL

Renaming a Table

SQL

Foreign keys and indexes that reference the table are updated automatically. Views and application code that reference the old name will break.

DROP TABLE, TRUNCATE, and DELETE

Three different ways to remove data, each with different scope and behavior.

DROP TABLE

Removes the table entirely: data, structure, indexes, constraints, everything.

SQL

If other tables have foreign keys pointing to this table, the drop fails:

SQL

CASCADE drops the table and all dependent foreign key constraints (not the dependent tables themselves):

SQL

RESTRICT (the default) blocks the drop when dependencies exist. It's the same as omitting CASCADE.

Like CREATE TABLE, DROP TABLE supports IF EXISTS to avoid errors when the table doesn't exist:

SQL

TRUNCATE

Removes all rows but keeps the table structure, indexes, and constraints:

SQL

TRUNCATE is much faster than DELETE for removing all rows because it doesn't log individual row deletions. It also resets SERIAL/BIGSERIAL sequences by default in PostgreSQL.

If the table has child rows in other tables, TRUNCATE fails unless you use CASCADE:

SQL

DROP vs TRUNCATE vs DELETE

Aspect	DROP TABLE	TRUNCATE	DELETE
Removes data	Yes	Yes	Yes (with WHERE)
Removes structure	Yes	No	No
Can filter rows	No	No	Yes
Resets sequences	N/A	Yes (PostgreSQL)	No
Fires triggers	No	No	Yes
Can be rolled back	Depends on DB	Depends on DB	Yes
Speed (large tables)	Fast	Fast	Slow (row-by-row)

Use DELETE when you want to remove specific rows. Use TRUNCATE when you want to empty a table fast. Use DROP when you want to remove the table entirely.

CREATE TABLE AS SELECT (CTAS)

CTAS creates a new table from the result of a query. The new table's columns match the query's output columns:

SQL

This creates a top_artists table with four columns, populated with the query results. It's useful for creating summary tables, snapshots, or staging data for analysis.

There's a critical gotcha: CTAS copies data and column types, but not constraints. The new table has no primary key, no foreign keys, no NOT NULL constraints, no defaults, and no indexes. You need to add them manually:

SQL

CTAS also supports IF NOT EXISTS:

SQL

The query in a CTAS can be as complex as any SELECT: joins, aggregations, window functions, CTEs. The result is materialized into a real table.

Operation	PostgreSQL	MySQL	SQL Server
CTAS	CREATE TABLE ... AS SELECT	CREATE TABLE ... AS SELECT	SELECT ... INTO new_table FROM ...

Stored Procedures and Functions

Stored procedures and functions are blocks of reusable SQL logic that live on the database server. Interviewers rarely deep-dive on these for individual contributor roles, but you should know what they are.

A function takes inputs, runs logic, and returns a result. PostgreSQL uses CREATE FUNCTION:

SQL

A stored procedure is similar but doesn't return a value directly. It performs actions like inserting, updating, or deleting data. PostgreSQL added procedures in version 11 with CREATE PROCEDURE:

SQL

Common use cases for stored procedures and functions: complex business logic that needs to run close to the data, data migrations, scheduled cleanup jobs, and encapsulating multi-step operations into a single call.

Feature	Function	Procedure
Returns a value	Yes	No (uses OUT parameters)
Can be used in SELECT	Yes	No
Can manage transactions	No	Yes (COMMIT/ROLLBACK inside)
Called with	SELECT function()	CALL procedure()

Triggers

A trigger is a function that runs automatically when a specific event happens on a table: INSERT, UPDATE, or DELETE. They're useful for audit logging, enforcing complex business rules, and keeping derived data in sync.

Here's a trigger that logs every change to the artists table into an audit table:

SQL

After this trigger is in place, every INSERT, UPDATE, or DELETE on artists automatically creates a row in artist_audit_log with the artist ID and the operation type.

Common trigger use cases:

Audit trails: Record who changed what and when.
Denormalization sync: Update a cached count or summary when source data changes.
Complex validation: Enforce rules that span multiple tables (though foreign keys and CHECK constraints are preferred when possible).

Triggers can make debugging harder because they run implicitly. A simple UPDATE might cascade into multiple trigger executions across several tables. Use them for cross-cutting concerns like audit logging, not for core business logic.

Timing	Fires	Use For
BEFORE	Before the row change is applied	Validation, modifying the incoming data
AFTER	After the row change is applied	Audit logging, notifications, syncing derived data
INSTEAD OF	Replaces the original operation	Making views updatable

Creating and Modifying Tables