The relational model gives you data structure, but structure alone is not enough to prevent bad data.

Without enforcement rules, problems can slip in quickly. You might end up with duplicate user IDs, streams linked to users who do not exist, or payments with invalid values such as negative amounts.

This is where keys and constraints come in.

Keys make sure each row can be uniquely identified, while constraints define what data is allowed. Together, they protect data quality and help keep your database consistent, accurate, and reliable.

Primary Keys

Every table needs a reliable way to identify each row.

Imagine two users both named “alex.” If you run an update, which one should the database modify? Or if a stream event doesn’t have a unique identifier, how does your application refer to that exact record?

This is what a primary key solves.

A primary key is a column (or a combination of columns) that uniquely identifies every row in a table. The database enforces two strict rules:

Each value must be unique
No value can be NULL

Together, these guarantees ensure that every row can be referenced unambiguously.

In StreamFlow, users.user_id is the primary key for users table:

SQL

In PostgreSQL, you’ll often see primary keys defined using SERIAL, which automatically generates an incrementing integer for each new row.

That means you don’t need to manually assign IDs. When you insert a new user, the database picks the next available number for you.

SQL

This auto-generation is convenient, but the real value of a primary key is the uniqueness guarantee. If you try to insert a row with a duplicate primary key, the database will reject it.

Example: if you manually insert a user_id that already exists, the query will fail.

SQL

Not all tables can rely on regular integers. For high-volume tables, you’ll often see BIGSERIAL, which generates a BIGINT instead of an INT.

For example, StreamFlow’s streams table uses BIGSERIAL because a popular platform can generate billions of stream events. Tables like ad_impressions follow the same pattern.

SQL

Lower-volume tables, such as users, artists, or playlists, typically use SERIAL because they’re unlikely to hit that limit.

Every table has exactly one primary key. You can't declare two. Behind the scenes, the database automatically creates an index on the primary key, which makes lookups by ID extremely fast.

Natural vs Surrogate Keys

A primary key can be any column that is both unique and not null, but choosing the right column matters. In practice, there are two common approaches.

Natural Key

A natural key uses a value that already exists in the data and is naturally unique.

Examples: Email address, ISBN for books, Social Security number, Country code

These values already have meaning in the real world.

Surrogate Key

A surrogate key is an artificial identifier created only to identify rows. It has no business meaning.

Common examples: SERIAL, BIGSERIAL,UUID

These values exist purely to identify rows.

In StreamFlow, most tables use surrogate keys. For example, the users table uses user_id as the primary key, even though fields like username or email might also be unique.

This is intentional. Surrogate keys give you more flexibility and stability over time, which becomes especially important as your system grows.

Factor	Natural Key (email)	Surrogate Key (user_id)
Stability	Emails change	Auto-generated, never changes
Storage size	VARCHAR(100), up to 100 bytes	INT, 4 bytes
Join performance	String comparison is slower	Integer comparison is fast
Readability	`WHERE email = 'alice@example.com'` is clear	`WHERE user_id = 42` requires a lookup
Foreign key references	Every child table stores a 100-byte string	Every child table stores a 4-byte integer

Imagine using email as the primary key for users. Then tables like streams, playlists, follows, and payments would need to store the user’s full email address in every related row.

With billions of records, that creates:

More storage usage
Larger indexes
Slower joins
More expensive updates

It also creates a maintenance problem. If a user changes their email, every referencing table would need updates. That is exactly the kind of problem surrogate keys avoid.

That said, natural keys work well in some situations. Reference tables with stable, standardized values are good candidates. If StreamFlow had a separate countries table, the ISO country code ('US', 'GB', 'BR') would be a reasonable natural key because ISO codes are short, stable, and universally recognized. Currency codes ('USD', 'EUR') follow the same logic.

The decision comes down to a few questions:

If any answer leads to "no," a surrogate key is the safer choice. In practice, most application tables use surrogate keys, and natural uniqueness is enforced through separate UNIQUE constraints (like users.username and users.email).

What About UUIDs?

A UUID (Universally Unique Identifier) is a 128-bit value that’s generated in a way that makes collisions extremely unlikely, even across different systems. This makes it especially useful in distributed setups, where multiple servers need to generate IDs independently without coordinating with each other.

There are trade-offs, though:

Size: UUIDs take 16 bytes, compared to 4 bytes for an INT
Readability: A value like a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11 is much harder to work with than something like 42

For a single-database system like StreamFlow, SERIAL is usually the simpler and more practical choice.

Foreign Keys

Tables are connected through shared columns.

For example, the streams table has a user_id column that points to users.user_id. This is how each stream event is linked to the user who generated it.

But here’s the catch: just having matching column names doesn’t enforce anything.

Without a constraint, you could insert a stream with user_id = 99999, even if no such user exists. The database wouldn’t stop you, and now you have inconsistent data.

A foreign key constraint tells the database: "this column's values must exist in the referenced table's column." If the referenced row doesn't exist, the insert fails.

SQL

The REFERENCES users(user_id) clause is the foreign key. It creates a rule: every value in streams.user_id must match an existing value in users.user_id.

The FK Chain

Foreign keys don’t just link two tables, they create dependency chains across your entire database.

In StreamFlow, the chain from a stream event back to an artist looks like this:

Arrows point from the referenced (parent) table to the referencing (child) table.

This has a practical implication: you can’t insert data in any order you want.

To insert a stream → the user and track must exist
To insert a track → the album must exist
To insert an album → the artist and genre must exist

So the correct order becomes: artists and genres → albums → tracks → streams

Self-Referencing Foreign Keys

A foreign key can reference the same table it belongs to. StreamFlow has a few examples:

genres.parent_genre_id references genres.genre_id (genre hierarchies like Rock -> Alternative Rock)
artists.manager_artist_id references artists.artist_id (management hierarchies)
playlists.forked_from_playlist_id references playlists.playlist_id (forked playlists)

All three columns allow NULL. The root of any hierarchy has no parent, so parent_genre_id is NULL for top-level genres like "Rock" or "Electronic." Similarly, not every artist has a manager, and most playlists aren't forks.

SQL

ON DELETE: What Happens When You Remove a Parent Row?

Foreign keys protect data on insert, but what about deletion? If you delete an artist from the artists table, what happens to their albums?

The answer depends on the referential action you specify. PostgreSQL supports five options:

Action	Behavior	When to Use
`RESTRICT`	Block the delete if any child rows exist	Prevent accidental data loss
`NO ACTION`	Same as RESTRICT (default in PostgreSQL)	Default behavior
`CASCADE`	Delete all child rows automatically	Cleanup dependent data (e.g., playlist -> playlist_tracks)
`SET NULL`	Set the FK column to NULL in child rows	Optional relationships (e.g., `albums.genre_id`)
`SET DEFAULT`	Set the FK column to its DEFAULT value	Rare, requires a DEFAULT that satisfies the FK

You specify the action in the foreign key declaration:

SQL

With ON DELETE CASCADE, deleting a playlist automatically removes all its entries from playlist_tracks. This makes sense because playlist-track associations have no meaning without the playlist.

But CASCADE is dangerous in other contexts. Deleting a user with CASCADE on streams would wipe out all their listening history, which is probably valuable analytics data. For tables like that, RESTRICT is safer: it forces you to handle the child data explicitly before deleting the parent.

What goes wrong here?

SQL

Fix: Either delete the dependent rows first, reassign them to another user, or use CASCADE if automatic cleanup is the intended behavior.

The ON UPDATE clause works similarly. ON UPDATE CASCADE means if a parent's primary key value changes, the new value propagates to all child rows. This is rare in practice because surrogate keys (SERIAL) almost never change.

Dialect	Default ON DELETE	Syntax Differences
PostgreSQL	`NO ACTION`	Standard syntax
MySQL	`RESTRICT`	Same effect as NO ACTION
SQL Server	`NO ACTION`	Same syntax, also supports `ON DELETE SET DEFAULT`

The practical difference between RESTRICT and NO ACTION is subtle: RESTRICT checks immediately, while NO ACTION checks at the end of the statement (or transaction in PostgreSQL). For most purposes, they behave the same.

Composite Keys

Some tables don't have a single column that uniquely identifies each row. The playlist_tracks table is a good example. Neither playlist_id nor track_id alone is unique: the same playlist appears in many rows (one per track), and the same track appears in many rows (one per playlist). But the combination of playlist_id and track_id is unique, because a specific track can only appear once in a specific playlist.

That combination is a composite primary key:

SQL

The PRIMARY KEY (playlist_id, track_id) declaration means: each individual column can repeat, but the same pair cannot appear twice.

playlist_id	track_id	position	Allowed?
1	5	1	Yes
1	8	2	Yes (same playlist, different track)
2	5	1	Yes (same track, different playlist)
1	5	3	No, (1, 5) already exists

StreamFlow has three tables with composite primary keys, all junction tables:

SQL

The composite key in follows prevents a user from following the same person twice. The composite key in artist_follows prevents a user from following the same artist twice.

Composite PK vs Surrogate PK with UNIQUE

An alternative design for playlist_tracks would be to add a surrogate primary key and enforce uniqueness separately:

SQL

Both designs prevent duplicate playlist-track pairs. The difference is practical:

Factor	Composite PK	Surrogate PK + UNIQUE
Extra column	No	Yes (the `id` column)
Referencing from other tables	Must use both columns	Can use single `id` column
Index count	One (the composite PK index)	Two (PK index + UNIQUE index)
Simplicity	Simpler for junction tables	Simpler for tables referenced by others

For junction tables that no other table references, composite primary keys are the cleaner choice. If other tables need to reference specific rows (e.g., a playlist_track_ratings table), a surrogate key makes foreign key references easier.

Column Constraints

Beyond keys, SQL provides four constraints that control what values individual columns can hold.

UNIQUE

A UNIQUE constraint ensures no two rows have the same value in that column. The users table has two UNIQUE constraints beyond its primary key:

SQL

Both username and email must be unique across all users. This is how StreamFlow can use a surrogate key (user_id) for internal references while still guaranteeing that no two users share the same username or email.

UNIQUE differs from PRIMARY KEY in two important ways. First, a table can have multiple UNIQUE constraints (like username and email) but only one PRIMARY KEY.

Second, UNIQUE columns allow NULL values in PostgreSQL, and multiple rows can have NULL in a UNIQUE column without violating the constraint. The logic is that NULL means "unknown," and two unknown values aren't considered equal.

You can also create a multi-column UNIQUE constraint that ensures the combination is unique:

SQL

This allows different artists to have albums with the same title, but the same artist cannot release two albums with identical titles.

NOT NULL

NOT NULL forces a column to always contain a value. Any INSERT or UPDATE that would set the column to NULL gets rejected.

StreamFlow uses NOT NULL on columns where a missing value would break business logic:

Table	NOT NULL Columns	Why Required
`users`	`username`, `email`, `signup_date`	Can't have a user without identity or signup date
`tracks`	`title`, `duration_seconds`, `album_id`	A track needs a name, length, and parent album
`streams`	`user_id`, `track_id`, `started_at`	A stream event is meaningless without who, what, and when
`payments`	`amount`, `payment_date`, `status`	A payment record needs all core fields

Columns that intentionally allow NULL represent optional or unknown data:

Column	Why NULL Is Allowed
`users.country`	User hasn't provided location yet
`albums.release_date`	Album hasn't been released yet
`subscriptions.end_date`	NULL means the subscription is still active
`tracks.track_number`	Singles or bonus tracks may not have a position
`ad_impressions.user_id`	Impression shown to an anonymous (logged-out) user

The decision to use NOT NULL is a design choice about what data your application requires. If a column is NOT NULL, the application must always provide a value, which can be inconvenient for partial data entry. If a column allows NULL, every query that touches it needs to handle the NULL case correctly (as we saw in the previous chapter). There is no free lunch.

CHECK

CHECK constraints enforce conditions that data types alone cannot express. An INT column accepts any integer, including negatives. A VARCHAR(10) column accepts any string up to 10 characters, including gibberish. CHECK lets you define business rules at the database level.

The StreamFlow schema doesn't include CHECK constraints, but several tables would benefit from them. Adding a CHECK constraint to an existing table uses ALTER TABLE:

SQL

Now any INSERT or UPDATE that sets plan to a value outside that list gets rejected:

SQL

Other useful CHECK constraints for StreamFlow:

SQL

That last one is interesting. The composite primary key on follows prevents duplicate follow relationships, but it doesn't prevent a user from following themselves. Only a CHECK constraint can enforce that kind of cross-column business rule.

CHECK constraints can reference multiple columns in the same row (like refund_amount <= amount), but they cannot reference other tables. If you need to validate against data in another table, foreign keys or application-level logic are the right tools.

Dialect	CHECK Support
PostgreSQL	Full support, enforced on INSERT and UPDATE
MySQL	Supported since MySQL 8.0.16, ignored in earlier versions
SQL Server	Full support

DEFAULT

DEFAULT provides an automatic value for a column when the INSERT statement doesn't include it. StreamFlow uses DEFAULT in several places:

SQL

When you insert a new user without specifying subscription_tier, the database automatically sets it to 'free':

SQL

DEFAULT NOW() is a common pattern for timestamp columns. It captures the exact moment the row was inserted without the application needing to provide it.

What goes wrong here?

SQL

DEFAULT only fires when the column is omitted from the INSERT. If you explicitly pass NULL, the database uses NULL. To prevent this, combine DEFAULT with NOT NULL:

SQL

Now omitting the column gives 'free', and explicitly passing NULL gives an error. The two constraints work together: NOT NULL blocks bad values, DEFAULT provides a good one.

Constraints Working Together

Real tables combine multiple constraints. Here's the payments table rewritten with explicit constraint names and additional CHECK constraints:

SQL

Naming constraints matters because error messages include the constraint name. violates constraint "chk_payment_amount" is much more helpful than violates constraint "payments_check". The naming convention chk_ for CHECK, fk_ for foreign keys, and pk_ for primary keys makes it easy to identify what kind of rule was violated.

Each constraint in this table serves a distinct purpose:

Constraint	Type	What It Prevents
`pk_payments`	PRIMARY KEY	Duplicate or NULL payment IDs
`fk_payments_user`	FOREIGN KEY	Payments for nonexistent users
`chk_payment_amount`	CHECK	Negative payment amounts
`NOT NULL` on `amount`	NOT NULL	Missing payment amounts
`chk_currency`	CHECK	Invalid currency codes
`DEFAULT 'USD'` on `currency`	DEFAULT	Missing currency defaults to USD
`chk_status`	CHECK	Invalid payment statuses
`chk_refund`	CHECK	Refund exceeding original amount, or negative refunds
`DEFAULT 0` on `refund_amount`	DEFAULT	Missing refund defaults to zero

This layered approach catches bad data at the database level regardless of which application, script, or manual query inserts it. Application-level validation is still important for user-friendly error messages, but database constraints are the last line of defense.

Quiz

Keys and Constraints Quiz

10 quizzes