Last Updated: May 1, 2026
The relational model gives your data structure, but structure alone is not enough to prevent bad data.
Without enforcement rules, problems can slip in quickly. You might end up with duplicate user IDs, streams linked to users who do not exist, or payments with invalid values such as negative amounts.
This is where keys and constraints come in.
Keys make sure each row can be uniquely identified, while constraints define what data is allowed. Together, they protect data quality and help keep your database consistent, accurate, and reliable.
Every table needs a reliable way to identify each row.
Imagine two users both named “alex.” If you run an update, which one should the database modify? Or if a stream event doesn’t have a unique identifier, how does your application refer to that exact record?
This is what a primary key solves.
A primary key is a column (or a combination of columns) that uniquely identifies every row in a table. The database enforces two strict rules:
Together, these guarantees ensure that every row can be referenced unambiguously.
In StreamFlow, users.user_id is the primary key for users table:
In PostgreSQL, you’ll often see primary keys defined using SERIAL, which automatically generates an incrementing integer for each new row.
That means you don’t need to manually assign IDs. When you insert a new user, the database picks the next available number for you.
This auto-generation is convenient, but the real value of a primary key is the uniqueness guarantee. If you try to insert a row with a duplicate primary key, the database will reject it.
Example: if you manually insert a user_id that already exists, the query will fail.
Not all tables can rely on regular integers. For high-volume tables, you’ll often see BIGSERIAL, which generates a BIGINT instead of an INT.
For example, StreamFlow’s streams table uses BIGSERIAL because a popular platform can generate billions of stream events. Tables like ad_impressions follow the same pattern.
Lower-volume tables, such as users, artists, or playlists, typically use SERIAL because they’re unlikely to hit that limit.
Every table has exactly one primary key. You can't declare two. Behind the scenes, the database automatically creates an index on the primary key, which makes lookups by ID extremely fast.
A primary key can be any column that is both unique and not null, but choosing the right column matters. In practice, there are two common approaches.
A natural key uses a value that already exists in the data and is naturally unique.
Examples: Email address, ISBN for books, Social Security number, Country code
These values already have meaning in the real world.
A surrogate key is an artificial identifier created only to identify rows. It has no business meaning.
Common examples: SERIAL, BIGSERIAL,UUID
These values exist purely to identify rows.
In StreamFlow, most tables use surrogate keys. For example, the users table uses user_id as the primary key, even though fields like username or email might also be unique.
This is intentional. Surrogate keys give you more flexibility and stability over time, which becomes especially important as your system grows.
| Factor | Natural Key (email) | Surrogate Key (user_id) |
|---|---|---|
| Stability | Emails change | Auto-generated, never changes |
| Storage size | VARCHAR(100), up to 100 bytes | INT, 4 bytes |
| Join performance | String comparison is slower | Integer comparison is fast |
| Readability | WHERE email = 'alice@example.com' is clear | WHERE user_id = 42 requires a lookup |
| Foreign key references | Every child table stores a 100-byte string | Every child table stores a 4-byte integer |
Imagine using email as the primary key for users. Then tables like streams, playlists, follows, and payments would need to store the user’s full email address in every related row.
With billions of records, that creates:
It also creates a maintenance problem. If a user changes their email, every referencing table would need updates. That is exactly the kind of problem surrogate keys avoid.
That said, natural keys work well in some situations. Reference tables with stable, standardized values are good candidates. If StreamFlow had a separate countries table, the ISO country code ('US', 'GB', 'BR') would be a reasonable natural key because ISO codes are short, stable, and universally recognized. Currency codes ('USD', 'EUR') follow the same logic.
The decision comes down to a few questions:
If any answer leads to "no," a surrogate key is the safer choice. In practice, most application tables use surrogate keys, and natural uniqueness is enforced through separate UNIQUE constraints (like users.username and users.email).
A UUID (Universally Unique Identifier) is a 128-bit value that’s generated in a way that makes collisions extremely unlikely, even across different systems. This makes it especially useful in distributed setups, where multiple servers need to generate IDs independently without coordinating with each other.
There are trade-offs, though:
INTa0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11 is much harder to work with than something like 42For a single-database system like StreamFlow, SERIAL is usually the simpler and more practical choice.
Tables are connected through shared columns.
For example, the streams table has a user_id column that points to users.user_id. This is how each stream event is linked to the user who generated it.
But here’s the catch: just having matching column names doesn’t enforce anything.
Without a constraint, you could insert a stream with user_id = 99999, even if no such user exists. The database wouldn’t stop you, and now you have inconsistent data.
A foreign key constraint tells the database: "this column's values must exist in the referenced table's column." If the referenced row doesn't exist, the insert fails.
The REFERENCES users(user_id) clause is the foreign key. It creates a rule: every value in streams.user_id must match an existing value in users.user_id.
Foreign keys don’t just link two tables, they create dependency chains across your entire database.
In StreamFlow, the chain from a stream event back to an artist looks like this:
Arrows point from the referenced (parent) table to the referencing (child) table.
This has a practical implication: you can’t insert data in any order you want.
So the correct order becomes: artists and genres → albums → tracks → streams
A foreign key can reference the same table it belongs to. StreamFlow has a few examples:
genres.parent_genre_id references genres.genre_id (genre hierarchies like Rock -> Alternative Rock)artists.manager_artist_id references artists.artist_id (management hierarchies)playlists.forked_from_playlist_id references playlists.playlist_id (forked playlists)All three columns allow NULL. The root of any hierarchy has no parent, so parent_genre_id is NULL for top-level genres like "Rock" or "Electronic." Similarly, not every artist has a manager, and most playlists aren't forks.
Foreign keys protect data on insert, but what about deletion? If you delete an artist from the artists table, what happens to their albums?
The answer depends on the referential action you specify. PostgreSQL supports five options:
| Action | Behavior | When to Use |
|---|---|---|
RESTRICT | Block the delete if any child rows exist | Prevent accidental data loss |
NO ACTION | Same as RESTRICT (default in PostgreSQL) | Default behavior |
CASCADE | Delete all child rows automatically | Cleanup dependent data (e.g., playlist -> playlist_tracks) |
SET NULL | Set the FK column to NULL in child rows | Optional relationships (e.g., albums.genre_id) |
SET DEFAULT | Set the FK column to its DEFAULT value | Rare, requires a DEFAULT that satisfies the FK |
You specify the action in the foreign key declaration:
With ON DELETE CASCADE, deleting a playlist automatically removes all its entries from playlist_tracks. This makes sense because playlist-track associations have no meaning without the playlist.
But CASCADE is dangerous in other contexts. Deleting a user with CASCADE on streams would wipe out all their listening history, which is probably valuable analytics data. For tables like that, RESTRICT is safer: it forces you to handle the child data explicitly before deleting the parent.
What goes wrong here?
Fix: Either delete the dependent rows first, reassign them to another user, or use CASCADE if automatic cleanup is the intended behavior.
The ON UPDATE clause works similarly. ON UPDATE CASCADE means if a parent's primary key value changes, the new value propagates to all child rows. This is rare in practice because surrogate keys (SERIAL) almost never change.
| Dialect | Default ON DELETE | Syntax Differences |
|---|---|---|
| PostgreSQL | NO ACTION | Standard syntax |
| MySQL | RESTRICT | Same effect as NO ACTION |
| SQL Server | NO ACTION | Same syntax, also supports ON DELETE SET DEFAULT |
The practical difference between RESTRICT and NO ACTION is subtle: RESTRICT checks immediately, while NO ACTION checks at the end of the statement (or transaction in PostgreSQL). For most purposes, they behave the same.
Some tables don't have a single column that uniquely identifies each row. The playlist_tracks table is a good example. Neither playlist_id nor track_id alone is unique: the same playlist appears in many rows (one per track), and the same track appears in many rows (one per playlist). But the combination of playlist_id and track_id is unique, because a specific track can only appear once in a specific playlist.
That combination is a composite primary key:
The PRIMARY KEY (playlist_id, track_id) declaration means: each individual column can repeat, but the same pair cannot appear twice.
| playlist_id | track_id | position | Allowed? |
|---|---|---|---|
| 1 | 5 | 1 | Yes |
| 1 | 8 | 2 | Yes (same playlist, different track) |
| 2 | 5 | 1 | Yes (same track, different playlist) |
| 1 | 5 | 3 | No, (1, 5) already exists |
StreamFlow has three tables with composite primary keys, all junction tables:
The composite key in follows prevents a user from following the same person twice. The composite key in artist_follows prevents a user from following the same artist twice.
An alternative design for playlist_tracks would be to add a surrogate primary key and enforce uniqueness separately:
Both designs prevent duplicate playlist-track pairs. The difference is practical:
| Factor | Composite PK | Surrogate PK + UNIQUE |
|---|---|---|
| Extra column | No | Yes (the id column) |
| Referencing from other tables | Must use both columns | Can use single id column |
| Index count | One (the composite PK index) | Two (PK index + UNIQUE index) |
| Simplicity | Simpler for junction tables | Simpler for tables referenced by others |
For junction tables that no other table references, composite primary keys are the cleaner choice. If other tables need to reference specific rows (e.g., a playlist_track_ratings table), a surrogate key makes foreign key references easier.
Beyond keys, SQL provides four constraints that control what values individual columns can hold.
A UNIQUE constraint ensures no two rows have the same value in that column. The users table has two UNIQUE constraints beyond its primary key:
Both username and email must be unique across all users. This is how StreamFlow can use a surrogate key (user_id) for internal references while still guaranteeing that no two users share the same username or email.
UNIQUE differs from PRIMARY KEY in two important ways. First, a table can have multiple UNIQUE constraints (like username and email) but only one PRIMARY KEY.
Second, UNIQUE columns allow NULL values in PostgreSQL, and multiple rows can have NULL in a UNIQUE column without violating the constraint. The logic is that NULL means "unknown," and two unknown values aren't considered equal.
You can also create a multi-column UNIQUE constraint that ensures the combination is unique:
This allows different artists to have albums with the same title, but the same artist cannot release two albums with identical titles.
NOT NULL forces a column to always contain a value. Any INSERT or UPDATE that would set the column to NULL gets rejected.
StreamFlow uses NOT NULL on columns where a missing value would break business logic:
| Table | NOT NULL Columns | Why Required |
|---|---|---|
users | username, email, signup_date | Can't have a user without identity or signup date |
tracks | title, duration_seconds, album_id | A track needs a name, length, and parent album |
streams | user_id, track_id, started_at | A stream event is meaningless without who, what, and when |
payments | amount, payment_date, status | A payment record needs all core fields |
Columns that intentionally allow NULL represent optional or unknown data:
| Column | Why NULL Is Allowed |
|---|---|
users.country | User hasn't provided location yet |
albums.release_date | Album hasn't been released yet |
subscriptions.end_date | NULL means the subscription is still active |
tracks.track_number | Singles or bonus tracks may not have a position |
ad_impressions.user_id | Impression shown to an anonymous (logged-out) user |
The decision to use NOT NULL is a design choice about what data your application requires. If a column is NOT NULL, the application must always provide a value, which can be inconvenient for partial data entry. If a column allows NULL, every query that touches it needs to handle the NULL case correctly (as we saw in the previous chapter). There is no free lunch.
CHECK constraints enforce conditions that data types alone cannot express. An INT column accepts any integer, including negatives. A VARCHAR(10) column accepts any string up to 10 characters, including gibberish. CHECK lets you define business rules at the database level.
The StreamFlow schema doesn't include CHECK constraints, but several tables would benefit from them. Adding a CHECK constraint to an existing table uses ALTER TABLE:
Now any INSERT or UPDATE that sets plan to a value outside that list gets rejected:
Other useful CHECK constraints for StreamFlow:
That last one is interesting. The composite primary key on follows prevents duplicate follow relationships, but it doesn't prevent a user from following themselves. Only a CHECK constraint can enforce that kind of cross-column business rule.
CHECK constraints can reference multiple columns in the same row (like refund_amount <= amount), but they cannot reference other tables. If you need to validate against data in another table, foreign keys or application-level logic are the right tools.
| Dialect | CHECK Support |
|---|---|
| PostgreSQL | Full support, enforced on INSERT and UPDATE |
| MySQL | Supported since MySQL 8.0.16, ignored in earlier versions |
| SQL Server | Full support |
DEFAULT provides an automatic value for a column when the INSERT statement doesn't include it. StreamFlow uses DEFAULT in several places:
When you insert a new user without specifying subscription_tier, the database automatically sets it to 'free':
DEFAULT NOW() is a common pattern for timestamp columns. It captures the exact moment the row was inserted without the application needing to provide it.
What goes wrong here?
DEFAULT only fires when the column is omitted from the INSERT. If you explicitly pass NULL, the database uses NULL. To prevent this, combine DEFAULT with NOT NULL:
Now omitting the column gives 'free', and explicitly passing NULL gives an error. The two constraints work together: NOT NULL blocks bad values, DEFAULT provides a good one.
Real tables combine multiple constraints. Here's the payments table rewritten with explicit constraint names and additional CHECK constraints:
Naming constraints matters because error messages include the constraint name. violates constraint "chk_payment_amount" is much more helpful than violates constraint "payments_check". The naming convention chk_ for CHECK, fk_ for foreign keys, and pk_ for primary keys makes it easy to identify what kind of rule was violated.
Each constraint in this table serves a distinct purpose:
| Constraint | Type | What It Prevents |
|---|---|---|
pk_payments | PRIMARY KEY | Duplicate or NULL payment IDs |
fk_payments_user | FOREIGN KEY | Payments for nonexistent users |
chk_payment_amount | CHECK | Negative payment amounts |
NOT NULL on amount | NOT NULL | Missing payment amounts |
chk_currency | CHECK | Invalid currency codes |
DEFAULT 'USD' on currency | DEFAULT | Missing currency defaults to USD |
chk_status | CHECK | Invalid payment statuses |
chk_refund | CHECK | Refund exceeding original amount, or negative refunds |
DEFAULT 0 on refund_amount | DEFAULT | Missing refund defaults to zero |
This layered approach catches bad data at the database level regardless of which application, script, or manual query inserts it. Application-level validation is still important for user-friendly error messages, but database constraints are the last line of defense.