Before writing any CREATE TABLE statement, you need a way to think about the data: what things exist, what properties they have, and how they connect to each other. Entity-Relationship (ER) modeling gives you a structured approach for turning business requirements into a relational schema. Get the ER model right, and the tables almost write themselves. Get it wrong, and you end up refactoring under pressure.

Sample Data

Run these statements to set up the tables and data used throughout this chapter.

SQL

Entities, Attributes, and Relationships

ER modeling starts with three concepts.

An entity is a distinct thing that the system needs to track. In StreamFlow, users, artists, albums, tracks, and playlists are all entities. Each one has its own identity and can exist independently. A user exists whether or not they have ever streamed a song, and an artist exists whether or not they have released an album.

An attribute is a property that describes an entity. A user has a username, email, country, and signup date. An artist has a name and a verified status. Attributes are the columns in the eventual table.

A relationship is an association between two entities. A user creates a playlist. An artist releases an album. An album contains tracks. Relationships are what connect the tables together through foreign keys.

The distinction between entity and attribute is not always obvious. Should "country" be an entity with its own table, or just an attribute on the users table? The answer depends on what the system needs. If country is just a two-character code stored alongside the user, it is an attribute. If the system needs to store additional information about countries (population, timezone, currency, licensing rules), then country becomes its own entity with its own table.

The rule of thumb: if something has multiple attributes of its own and other entities need to reference it, it is probably an entity. If it is a single value that describes another entity, it is an attribute.

In StreamFlow's case, the entities are clear. Users, artists, albums, tracks, genres, playlists, streams, payments, and campaigns are all things with their own identities and multiple attributes. Country, email, and track duration are attributes because they describe a single property of an entity.

ER Diagram Notation

ER diagrams use a visual language to represent entities, attributes, and relationships. Several notation systems exist, but Crow's foot notation is the most widely used in industry and the one you are most likely to encounter in interviews, design reviews, and database documentation.

Crow's foot Notation

In Crow's foot notation, entities are rectangles, attributes are listed inside the rectangle, and relationships are lines connecting the rectangles. The key feature is how cardinality is shown at each end of the line:

The symbols read from the entity outward:

Symbol	Meaning	Example
`\|\|` (single line)	Exactly one	An album belongs to exactly one artist
`o\|` (circle + line)	Zero or one	A user may have zero or one active subscription
`\|{` (line + crow's foot)	One or more	An album has one or more tracks
`o{` (circle + crow's foot)	Zero or more	An artist has zero or more albums

The "crow's foot" is the fork-shaped symbol that indicates "many." The circle indicates "zero is possible" (optional), while the single line indicates "at least one" (mandatory).

Here is a subset of the StreamFlow schema in Crow's foot notation using Mermaid's ER diagram syntax:

Reading this diagram: an artist releases zero or more albums (||--o{). An album contains one or more tracks (||--|{), because an album with no tracks would not make sense in StreamFlow's model. A user creates zero or more playlists. The playlist_tracks table sits between playlists and tracks, representing their many-to-many relationship.

Chen Notation

You may occasionally see Chen notation, the original ER notation from Peter Chen's 1976 paper. It uses rectangles for entities, ovals for attributes, and diamonds for relationships, with numbers (1, N, M) labeling cardinality.

Chen notation is mostly used in academic textbooks. In practice, Crow's foot is dominant because it is more compact: attributes are listed inside the entity rectangle instead of floating around as separate ovals, and cardinality is visible at a glance from the line endings. If you see Chen notation in an interview, the concepts are the same. Only the visual symbols differ.

Cardinality: One-to-One (1:1)

A one-to-one relationship means each row in table A relates to at most one row in table B, and vice versa. This is the rarest cardinality type. It usually appears when you want to split an entity's attributes into two tables for organizational or security reasons.

StreamFlow's current schema does not have a strict 1:1 relationship, so consider a hypothetical extension: storing sensitive account information (password hashes, two-factor settings) in a separate user_credentials table.

SQL

The critical detail is the UNIQUE constraint on user_id. Without it, this would be a one-to-many relationship (one user could have multiple credential rows). The UNIQUE constraint enforces that each user has at most one credentials row.

When 1:1 Relationships Make Sense

1:1 relationships exist for a reason, not just to split tables arbitrarily. Common motivations include:

Access control: Sensitive data (credentials, payment details) lives in a separate table with tighter permissions. Application code that reads user profiles never needs access to the credentials table.
Performance: A table with many columns where some are rarely queried can be split. The frequently accessed columns stay in the main table, and the rarely accessed ones move to a secondary table. This keeps the main table's rows smaller and more cache-friendly.
Optional data: When only some entities have the extra attributes. If only 10% of users enable two-factor authentication, putting those columns on the users table means 90% of rows carry NULL values for columns they never use.

The Alternative: Just Use One Table

In many cases, a 1:1 relationship is a sign that the two tables should be merged. If every user always has credentials and you don't need separate access control, putting the credential columns directly on users is simpler. Fewer tables means fewer joins.

The decision comes down to whether the split solves a real problem (security, performance, optional data) or just adds complexity for no benefit.

Cardinality: One-to-Many (1:N)

One-to-many is the relationship you will see most often. One artist releases many albums. One album contains many tracks. One user creates many playlists. The "one" side is the parent, and the "many" side is the child.

In SQL, one-to-many is implemented by placing a foreign key on the child table that references the parent's primary key:

SQL

The artist_id column on albums points back to the artists table. Each album has exactly one artist (the FK value), but each artist can have many albums (many rows in albums can share the same artist_id).

StreamFlow has several one-to-many relationships:

Parent (one)	Child (many)	FK Column	Meaning
`artists`	`albums`	`albums.artist_id`	An artist releases albums
`albums`	`tracks`	`tracks.album_id`	An album contains tracks
`users`	`streams`	`streams.user_id`	A user generates streams
`users`	`playlists`	`playlists.creator_user_id`	A user creates playlists
`users`	`payments`	`payments.user_id`	A user makes payments
`users`	`subscriptions`	`subscriptions.user_id`	A user holds subscriptions
`campaigns`	`ad_impressions`	`ad_impressions.campaign_id`	A campaign has impressions

Mandatory vs. Optional

The NOT NULL constraint on the foreign key determines whether the relationship is mandatory or optional:

SQL

In StreamFlow, albums.artist_id is NOT NULL because an album without an artist does not make sense. But ad_impressions.user_id is nullable because an impression can be shown to an anonymous visitor who is not logged in.

What Goes Wrong: FK on the Wrong Side

A common modeling mistake is placing the foreign key on the wrong table. If you want to model "one artist has many albums," you might be tempted to put an album_id column on the artists table:

What goes wrong here?

SQL

This only stores a single album per artist. There is no way to record a second or third album. The foreign key must go on the "many" side (albums), so multiple album rows can point to the same artist.

Fix: Put the FK on the child table:

SQL

The rule is simple: in a one-to-many relationship, the foreign key always goes on the "many" side.

Cardinality: Many-to-Many (M:N)

A many-to-many relationship means entities on both sides can relate to multiple entities on the other side. A playlist contains many tracks, and a track can appear in many playlists. A user follows many users, and a user can be followed by many users.

Relational databases cannot represent many-to-many relationships directly with a single foreign key. A foreign key column holds one value per row, so it can only point to one related entity. To model many-to-many, you introduce a junction table (also called a bridge table, join table, or associative table) that sits between the two entities.

Playlists and Tracks

StreamFlow's playlist_tracks table is a textbook junction table:

SQL

The composite primary key (playlist_id, track_id) ensures that the same track cannot appear in the same playlist twice. Each row represents one specific pairing: "track X is in playlist Y at position Z."

Without the junction table, you would face the same problem as putting the FK on the wrong side. A track_id column on playlists would only allow one track per playlist. A playlist_id column on tracks would only allow a track to belong to one playlist. Neither captures the full relationship.

Extra Attributes on Junction Tables

Junction tables often carry their own attributes that describe the relationship itself, not either entity. In playlist_tracks:

position describes the ordering of a track within a specific playlist. It belongs to the relationship, not to the track or the playlist.
added_at records when the track was added to the playlist.

These attributes would have nowhere to live without the junction table. They don't describe a playlist (which has a title and creator) or a track (which has a duration and album). They describe the specific pairing of a playlist and a track.

User-to-User Follows

The follows table models a many-to-many relationship between users and themselves:

SQL

This is still a junction table, but both foreign keys point to the same entity table (users). Alice can follow Bob, and Bob can follow Alice. The composite primary key prevents duplicate follow relationships, but the direction matters: (1, 2) means user 1 follows user 2, and (2, 1) means user 2 follows user 1. These are two distinct rows.

Similarly, artist_follows connects users and artists:

SQL

A user can follow many artists, and an artist can be followed by many users. The junction table captures each pairing with a timestamp.

Querying Through Junction Tables

To find all tracks in a playlist, you join through the junction table:

SQL

To find all playlists that contain a specific track:

SQL

The junction table is always in the middle of the join. Queries go from one entity, through the junction table, to the other entity.

Self-Referencing Relationships

A self-referencing relationship (also called a recursive relationship) is when an entity relates to itself. StreamFlow has three examples of this, each modeling a different kind of hierarchy.

Genre Hierarchies

The genres table uses parent_genre_id to create a tree structure:

SQL

"Alternative Rock" has parent_genre_id pointing to "Rock." "Smooth Jazz" points to "Jazz." Top-level genres like "Rock" and "Jazz" have parent_genre_id set to NULL, indicating they are root nodes.

To find all sub-genres of a genre:

SQL

Notice that the genres table is joined to itself. The aliases (child, parent) are essential here because without them, the query would be ambiguous about which instance of genres you mean.

Artist Management

The artists table has manager_artist_id, which references another artist who acts as a manager or label representative:

SQL

In the sample data, "Skyline Collective" has manager_artist_id = 1 (The Resonants), and "Echo Chamber" has manager_artist_id = 2 (Nova). This models a simple one-to-many self-reference: one artist manages zero or more other artists.

Playlist Forking

The playlists table uses forked_from_playlist_id to track playlist derivation:

SQL

When a user forks someone else's playlist, the new playlist records which original it was copied from. In the sample data, "Morning Vibes Remix" (playlist 4) was forked from "Morning Vibes" (playlist 1). This creates a one-to-many self-reference: one playlist can be the source of many forks.

Self-Referencing Patterns

All three examples follow the same SQL pattern: a nullable foreign key column that references the same table's primary key. The column is nullable because root-level entities (top genres, independent artists, original playlists) have no parent to reference.

Table	Self-Reference Column	Meaning
`genres`	`parent_genre_id`	Genre hierarchy (Rock > Alternative Rock)
`artists`	`manager_artist_id`	Management/label relationship
`playlists`	`forked_from_playlist_id`	Playlist derivation chain

When querying self-referencing tables, you always need a self-join with meaningful aliases that clarify the role of each instance.

From ER Diagram to Schema

Translating an ER diagram into SQL tables follows a systematic process. The steps are mechanical enough that you can apply them to any ER model.

Step 1: Each Entity Becomes a Table

Every entity in the ER diagram maps to a table. Give it a primary key (usually a SERIAL or BIGSERIAL surrogate key) and add the attributes as columns.

SQL

Step 2: One-to-Many Relationships Become Foreign Keys

For each one-to-many relationship, add a foreign key column to the "many" side table. The column references the primary key of the "one" side.

SQL

Decide whether the FK should be NOT NULL (mandatory: every album must have an artist) or nullable (optional: this entity may exist without the relationship).

Step 3: Many-to-Many Relationships Become Junction Tables

For each many-to-many relationship, create a new table with two foreign keys (one to each entity) and a composite primary key.

SQL

Add any relationship-specific attributes (position, timestamps, etc.) to the junction table.

Step 4: One-to-One Relationships Become FK + UNIQUE

For a one-to-one relationship, add a foreign key with a UNIQUE constraint to one of the two tables. Typically, the FK goes on the table that represents the "optional" or "dependent" side.

SQL

Step 5: Self-References Become FK to Same Table

For self-referencing relationships, add a nullable FK column that references the same table's primary key.

SQL

Common Translation Mistakes

Mistake	What Happens	Fix
FK on the "one" side of 1:N	Only one child per parent	Move FK to the "many" side
No junction table for M:N	Can only store one side of the relationship	Create a junction table with composite PK
Non-unique FK for 1:1	Becomes 1:N accidentally	Add UNIQUE constraint to the FK
NOT NULL on self-reference FK	Root entities cannot exist	Make self-referencing FKs nullable
Missing composite PK on junction table	Duplicate relationships allowed	Use composite PK on both FK columns

The Full StreamFlow ER Diagram

Here is the complete StreamFlow schema with all 14 tables and their relationships:

Walking through the key relationships:

The music catalog chain: artists → albums → tracks forms a strict hierarchy. Each artist has albums, each album has tracks. All one-to-many, all mandatory (every track must belong to an album, every album must belong to an artist).
User activity: users → streams and users → playlists are one-to-many. A user can have zero streams (just signed up) or millions.
Many-to-many through junction tables: playlists ↔ tracks (via playlist_tracks), users ↔ users (via follows), and users ↔ artists (via artist_follows).
Self-references: genres.parent_genre_id, artists.manager_artist_id, and playlists.forked_from_playlist_id each create hierarchies within a single table.
The `streams` table sits at a busy intersection. It references both users and tracks, making it the central fact table for the entire platform's activity data.

Entity-Relationship Modeling