Last Updated: May 3, 2026
Before writing any CREATE TABLE statement, you need a way to think about the data: what things exist, what properties they have, and how they connect to each other. Entity-Relationship (ER) modeling gives you a structured approach for turning business requirements into a relational schema. Get the ER model right, and the tables almost write themselves. Get it wrong, and you end up refactoring under pressure.
Run these statements to set up the tables and data used throughout this chapter.
ER modeling starts with three concepts.
An entity is a distinct thing that the system needs to track. In StreamFlow, users, artists, albums, tracks, and playlists are all entities. Each one has its own identity and can exist independently. A user exists whether or not they have ever streamed a song, and an artist exists whether or not they have released an album.
An attribute is a property that describes an entity. A user has a username, email, country, and signup date. An artist has a name and a verified status. Attributes are the columns in the eventual table.
A relationship is an association between two entities. A user creates a playlist. An artist releases an album. An album contains tracks. Relationships are what connect the tables together through foreign keys.
The distinction between entity and attribute is not always obvious. Should "country" be an entity with its own table, or just an attribute on the users table? The answer depends on what the system needs. If country is just a two-character code stored alongside the user, it is an attribute. If the system needs to store additional information about countries (population, timezone, currency, licensing rules), then country becomes its own entity with its own table.
The rule of thumb: if something has multiple attributes of its own and other entities need to reference it, it is probably an entity. If it is a single value that describes another entity, it is an attribute.
In StreamFlow's case, the entities are clear. Users, artists, albums, tracks, genres, playlists, streams, payments, and campaigns are all things with their own identities and multiple attributes. Country, email, and track duration are attributes because they describe a single property of an entity.
ER diagrams use a visual language to represent entities, attributes, and relationships. Several notation systems exist, but Crow's foot notation is the most widely used in industry and the one you are most likely to encounter in interviews, design reviews, and database documentation.
In Crow's foot notation, entities are rectangles, attributes are listed inside the rectangle, and relationships are lines connecting the rectangles. The key feature is how cardinality is shown at each end of the line:
The symbols read from the entity outward:
| Symbol | Meaning | Example |
|---|---|---|
|| (single line) | Exactly one | An album belongs to exactly one artist |
o| (circle + line) | Zero or one | A user may have zero or one active subscription |
|{ (line + crow's foot) | One or more | An album has one or more tracks |
o{ (circle + crow's foot) | Zero or more | An artist has zero or more albums |
The "crow's foot" is the fork-shaped symbol that indicates "many." The circle indicates "zero is possible" (optional), while the single line indicates "at least one" (mandatory).
Here is a subset of the StreamFlow schema in Crow's foot notation using Mermaid's ER diagram syntax:
Reading this diagram: an artist releases zero or more albums (||--o{). An album contains one or more tracks (||--|{), because an album with no tracks would not make sense in StreamFlow's model. A user creates zero or more playlists. The playlist_tracks table sits between playlists and tracks, representing their many-to-many relationship.
You may occasionally see Chen notation, the original ER notation from Peter Chen's 1976 paper. It uses rectangles for entities, ovals for attributes, and diamonds for relationships, with numbers (1, N, M) labeling cardinality.
Chen notation is mostly used in academic textbooks. In practice, Crow's foot is dominant because it is more compact: attributes are listed inside the entity rectangle instead of floating around as separate ovals, and cardinality is visible at a glance from the line endings. If you see Chen notation in an interview, the concepts are the same. Only the visual symbols differ.
A one-to-one relationship means each row in table A relates to at most one row in table B, and vice versa. This is the rarest cardinality type. It usually appears when you want to split an entity's attributes into two tables for organizational or security reasons.
StreamFlow's current schema does not have a strict 1:1 relationship, so consider a hypothetical extension: storing sensitive account information (password hashes, two-factor settings) in a separate user_credentials table.
The critical detail is the UNIQUE constraint on user_id. Without it, this would be a one-to-many relationship (one user could have multiple credential rows). The UNIQUE constraint enforces that each user has at most one credentials row.
1:1 relationships exist for a reason, not just to split tables arbitrarily. Common motivations include:
users table means 90% of rows carry NULL values for columns they never use.In many cases, a 1:1 relationship is a sign that the two tables should be merged. If every user always has credentials and you don't need separate access control, putting the credential columns directly on users is simpler. Fewer tables means fewer joins.
The decision comes down to whether the split solves a real problem (security, performance, optional data) or just adds complexity for no benefit.
One-to-many is the relationship you will see most often. One artist releases many albums. One album contains many tracks. One user creates many playlists. The "one" side is the parent, and the "many" side is the child.
In SQL, one-to-many is implemented by placing a foreign key on the child table that references the parent's primary key:
The artist_id column on albums points back to the artists table. Each album has exactly one artist (the FK value), but each artist can have many albums (many rows in albums can share the same artist_id).
StreamFlow has several one-to-many relationships:
| Parent (one) | Child (many) | FK Column | Meaning |
|---|---|---|---|
artists | albums | albums.artist_id | An artist releases albums |
albums | tracks | tracks.album_id | An album contains tracks |
users | streams | streams.user_id | A user generates streams |
users | playlists | playlists.creator_user_id | A user creates playlists |
users | payments | payments.user_id | A user makes payments |
users | subscriptions | subscriptions.user_id | A user holds subscriptions |
campaigns | ad_impressions | ad_impressions.campaign_id | A campaign has impressions |
The NOT NULL constraint on the foreign key determines whether the relationship is mandatory or optional:
In StreamFlow, albums.artist_id is NOT NULL because an album without an artist does not make sense. But ad_impressions.user_id is nullable because an impression can be shown to an anonymous visitor who is not logged in.
A common modeling mistake is placing the foreign key on the wrong table. If you want to model "one artist has many albums," you might be tempted to put an album_id column on the artists table:
What goes wrong here?
This only stores a single album per artist. There is no way to record a second or third album. The foreign key must go on the "many" side (albums), so multiple album rows can point to the same artist.
Fix: Put the FK on the child table:
The rule is simple: in a one-to-many relationship, the foreign key always goes on the "many" side.
A many-to-many relationship means entities on both sides can relate to multiple entities on the other side. A playlist contains many tracks, and a track can appear in many playlists. A user follows many users, and a user can be followed by many users.
Relational databases cannot represent many-to-many relationships directly with a single foreign key. A foreign key column holds one value per row, so it can only point to one related entity. To model many-to-many, you introduce a junction table (also called a bridge table, join table, or associative table) that sits between the two entities.
StreamFlow's playlist_tracks table is a textbook junction table:
The composite primary key (playlist_id, track_id) ensures that the same track cannot appear in the same playlist twice. Each row represents one specific pairing: "track X is in playlist Y at position Z."
Without the junction table, you would face the same problem as putting the FK on the wrong side. A track_id column on playlists would only allow one track per playlist. A playlist_id column on tracks would only allow a track to belong to one playlist. Neither captures the full relationship.
Junction tables often carry their own attributes that describe the relationship itself, not either entity. In playlist_tracks:
position describes the ordering of a track within a specific playlist. It belongs to the relationship, not to the track or the playlist.added_at records when the track was added to the playlist.These attributes would have nowhere to live without the junction table. They don't describe a playlist (which has a title and creator) or a track (which has a duration and album). They describe the specific pairing of a playlist and a track.
The follows table models a many-to-many relationship between users and themselves:
This is still a junction table, but both foreign keys point to the same entity table (users). Alice can follow Bob, and Bob can follow Alice. The composite primary key prevents duplicate follow relationships, but the direction matters: (1, 2) means user 1 follows user 2, and (2, 1) means user 2 follows user 1. These are two distinct rows.
Similarly, artist_follows connects users and artists:
A user can follow many artists, and an artist can be followed by many users. The junction table captures each pairing with a timestamp.
To find all tracks in a playlist, you join through the junction table:
To find all playlists that contain a specific track:
The junction table is always in the middle of the join. Queries go from one entity, through the junction table, to the other entity.
A self-referencing relationship (also called a recursive relationship) is when an entity relates to itself. StreamFlow has three examples of this, each modeling a different kind of hierarchy.
The genres table uses parent_genre_id to create a tree structure:
"Alternative Rock" has parent_genre_id pointing to "Rock." "Smooth Jazz" points to "Jazz." Top-level genres like "Rock" and "Jazz" have parent_genre_id set to NULL, indicating they are root nodes.
To find all sub-genres of a genre:
Notice that the genres table is joined to itself. The aliases (child, parent) are essential here because without them, the query would be ambiguous about which instance of genres you mean.
The artists table has manager_artist_id, which references another artist who acts as a manager or label representative:
In the sample data, "Skyline Collective" has manager_artist_id = 1 (The Resonants), and "Echo Chamber" has manager_artist_id = 2 (Nova). This models a simple one-to-many self-reference: one artist manages zero or more other artists.
The playlists table uses forked_from_playlist_id to track playlist derivation:
When a user forks someone else's playlist, the new playlist records which original it was copied from. In the sample data, "Morning Vibes Remix" (playlist 4) was forked from "Morning Vibes" (playlist 1). This creates a one-to-many self-reference: one playlist can be the source of many forks.
All three examples follow the same SQL pattern: a nullable foreign key column that references the same table's primary key. The column is nullable because root-level entities (top genres, independent artists, original playlists) have no parent to reference.
| Table | Self-Reference Column | Meaning |
|---|---|---|
genres | parent_genre_id | Genre hierarchy (Rock > Alternative Rock) |
artists | manager_artist_id | Management/label relationship |
playlists | forked_from_playlist_id | Playlist derivation chain |
When querying self-referencing tables, you always need a self-join with meaningful aliases that clarify the role of each instance.
Translating an ER diagram into SQL tables follows a systematic process. The steps are mechanical enough that you can apply them to any ER model.
Every entity in the ER diagram maps to a table. Give it a primary key (usually a SERIAL or BIGSERIAL surrogate key) and add the attributes as columns.
For each one-to-many relationship, add a foreign key column to the "many" side table. The column references the primary key of the "one" side.
Decide whether the FK should be NOT NULL (mandatory: every album must have an artist) or nullable (optional: this entity may exist without the relationship).
For each many-to-many relationship, create a new table with two foreign keys (one to each entity) and a composite primary key.
Add any relationship-specific attributes (position, timestamps, etc.) to the junction table.
For a one-to-one relationship, add a foreign key with a UNIQUE constraint to one of the two tables. Typically, the FK goes on the table that represents the "optional" or "dependent" side.
For self-referencing relationships, add a nullable FK column that references the same table's primary key.
| Mistake | What Happens | Fix |
|---|---|---|
| FK on the "one" side of 1:N | Only one child per parent | Move FK to the "many" side |
| No junction table for M:N | Can only store one side of the relationship | Create a junction table with composite PK |
| Non-unique FK for 1:1 | Becomes 1:N accidentally | Add UNIQUE constraint to the FK |
| NOT NULL on self-reference FK | Root entities cannot exist | Make self-referencing FKs nullable |
| Missing composite PK on junction table | Duplicate relationships allowed | Use composite PK on both FK columns |
Here is the complete StreamFlow schema with all 14 tables and their relationships:
Walking through the key relationships:
artists → albums → tracks forms a strict hierarchy. Each artist has albums, each album has tracks. All one-to-many, all mandatory (every track must belong to an album, every album must belong to an artist).users → streams and users → playlists are one-to-many. A user can have zero streams (just signed up) or millions.playlists ↔ tracks (via playlist_tracks), users ↔ users (via follows), and users ↔ artists (via artist_follows).genres.parent_genre_id, artists.manager_artist_id, and playlists.forked_from_playlist_id each create hierarchies within a single table.users and tracks, making it the central fact table for the entire platform's activity data.