Last Updated: May 1, 2026
SQL databases store data in tables, with clear rules for structure, data types, and relationships.
These rules are not there by accident. They help keep data consistent, reliable, and predictable, while preventing subtle issues that can quietly break applications and take hours to debug.
In this chapter, we’ll build a strong understanding of how tables really work. You’ll learn how data types shape your data, why NULL is more nuanced than it first appears, and how tables connect to model real-world relationships.
A table is a two-dimensional structure where each row represents one entity (a user, a track, a payment) and each column represents one attribute of that entity (a username, a duration, an amount). Every row in the same table has the same set of columns, and every column holds values of a single data type.
The formal terminology comes from relational algebra, the mathematical foundation behind SQL. A table is a relation, a row is a tuple, and a column is an attribute. You'll see these terms in academic papers and occasionally in interviews, but in practice most people just say table, row, and column.
Here's a simplified view of the users table in StreamFlow:
| user_id | username | country | subscription_tier | |
|---|---|---|---|---|
| 1 | alice_m | alice@example.com | US | premium |
| 2 | bob_jones | bob@example.com | GB | free |
| 3 | charlie_k | charlie@example.com | NULL | premium |
| 4 | diana_r | diana@example.com | BR | family |
A few things to notice. Every column has a specific data type: user_id is an integer, username is a string, country is a two-character string. The country column for charlie_k is NULL, meaning the value is missing or unknown. And each row is a complete record of one user.
The schema of a table is its structural definition: column names, data types, and constraints. The data is the actual rows stored in the table. You can think of the schema as the blueprint and the data as the building. The schema for users looks like this:
This definition tells the database exactly what kind of data each column can hold, which columns are required, and what defaults to use when a value isn't provided. The database enforces these rules on every insert and update. We'll dig deeper into the constraint mechanics (PRIMARY KEY, UNIQUE, NOT NULL) in the next chapter.
Every column in a table has a data type, and the database enforces it. If you try to insert the string 'hello' into an integer column, the database will reject it.
That strictness is a good thing. It catches invalid data at the moment it enters the system instead of letting it quietly spread through your application and cause harder-to-debug problems later.
Most data types fall into a few major categories.
Numeric types store numbers.
Integers are used to store whole numbers.
Most databases also provide ways to generate them automatically for IDs:
SERIAL or BIGSERIALAUTO_INCREMENTThese are commonly used for IDs, where each new row needs a unique number.
INT is used for standard integer values. When you expect much larger numbers, you use BIGINT. For example, a high-traffic streaming platform might use BIGSERIAL for streams.stream_id, since the system could generate billions of rows over time.
Decimal types like DECIMAL(8,2) are used when you need exact precision. This is especially important for things like money, where even small rounding errors from floating-point types can cause real problems.
If payments.amount uses DECIMAL(8,2), it means:
So it can store values from -999999.99 to 999999.99 with full precision.
VARCHAR(n) stores variable-length strings up to a maximum of n characters.
For example:
users.country might use VARCHAR(2) because ISO country codes are exactly two characterstracks.title might use VARCHAR(200) to support longer namesIf you try to insert a value that exceeds the limit, the database will reject it with an error.
TEXT stores strings with no fixed upper limit.
In databases like PostgreSQL, VARCHAR without a length limit and TEXT behave the same way internally. The difference is mostly about clarity.
Using something like VARCHAR(2) for a country code makes your intent obvious. TEXT is more flexible, but it doesn’t communicate constraints as clearly.
DATE stores just the calendar date, without any time information.
This is useful when the exact time doesn’t matter. For example, users.signup_date or albums.release_date only need to capture the day, not the hour or minute.
TIMESTAMP WITH TIME ZONE (often written as TIMESTAMPTZ) stores a precise moment in time, along with timezone context.
This matters when timing is important. For example, streams.started_at needs to capture exactly when a stream began, down to the second, and across different timezones.
When you query this type, the database automatically converts the timestamp to your session’s timezone, which makes it much easier to work with global data.
BOOLEAN stores simple true or false values.
You’ll often see it used for flags like:
artists.verifiedplaylists.is_publicIn PostgreSQL, booleans are flexible in how they’re written. You can use true or false, but values like 't', 'f', 1, 0, or even 'yes' and 'no' are also accepted.
JSONB (binary JSON in PostgreSQL) is used to store semi-structured data that doesn’t fit neatly into fixed columns.
For example, the tracks.metadata column might store details like BPM, musical key, or mood tags. The challenge is that not every track has the same set of attributes. Some might have BPM and key, others might include mood or genre-specific properties.
This is where JSONB helps. Instead of creating dozens or hundreds of optional columns, you can store this flexible data in a single column.
JSONB works well when:
The trade-off is that JSONB columns are harder to query, index, and validate than regular columns.
Most data types are standard SQL, but a few vary across databases:
| Feature | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Auto-increment ID | SERIAL | AUTO_INCREMENT | IDENTITY |
| Binary JSON | JSONB | JSON | NVARCHAR(MAX) with JSON functions |
| Boolean | BOOLEAN | TINYINT(1) | BIT |
| Timestamp with TZ | TIMESTAMP WITH TIME ZONE | TIMESTAMP (UTC only) | DATETIMEOFFSET |
| Unlimited text | TEXT | TEXT / LONGTEXT | VARCHAR(MAX) |
NULL is one of the most misunderstood concepts in SQL, and it’s responsible for a surprising number of bugs.
NULL doesn’t mean zero. It doesn’t mean an empty string. It means unknown or not applicable. That distinction is important, because NULL behaves very differently from any other value in SQL.
Take the users table as an example. The country column might allow NULL because some users haven’t shared their location. The timezone column might also allow NULL for the same reason.
These aren’t users from “nowhere.” They’re users whose information is simply missing.
The real confusion starts when you use NULL in comparisons or expressions, because it doesn’t behave the way most people expect.
This query looks like it should return users without a country, but it doesn’t work as expected.
What goes wrong here?
Fix: Use IS NULL for NULL checks:
The reason = NULL doesn't work is fundamental to how NULL operates. NULL represents an unknown value. Is an unknown value equal to another unknown value? The answer isn't TRUE or FALSE, it's "we don't know." And in SQL, "we don't know" evaluates to NULL, which is not TRUE, so the WHERE clause filters it out.
Most programming languages work with two outcomes: an expression is either TRUE or FALSE.
SQL adds a third possibility: NULL, which represents unknown.
This means every logical expression in SQL can evaluate to one of three values:
That extra state might seem small, but it changes how conditions behave in subtle ways. Logical operators like AND, OR, and NOT don’t just combine TRUE and FALSE, they also have to account for uncertainty.
As a result, expressions that look straightforward can produce unexpected results when NULL is involved. Understanding this three-valued logic is key to writing correct SQL queries.
Here's how AND, OR, and NOT behave with NULL:
AND with NULL:
| A | B | A AND B |
|---|---|---|
| TRUE | NULL | NULL |
| FALSE | NULL | FALSE |
| NULL | NULL | NULL |
AND returns FALSE only when it knows the result is FALSE (because one side is FALSE). If one side is TRUE and the other is unknown, the overall result is unknown.
OR with NULL:
| A | B | A OR B |
|---|---|---|
| TRUE | NULL | TRUE |
| FALSE | NULL | NULL |
| NULL | NULL | NULL |
OR returns TRUE when it knows the result is TRUE (because one side is TRUE). If one side is FALSE and the other is unknown, the overall result is unknown.
NOT with NULL:
| A | NOT A |
|---|---|
| TRUE | FALSE |
| FALSE | TRUE |
| NULL | NULL |
The NOT of an unknown value is still unknown.
Three-valued logic creates real pitfalls in queries. Consider finding all users who are NOT in the US:
This query excludes users where country is NULL, because NULL != 'US' evaluates to NULL, not TRUE. If you want to include users with unknown countries, you need to handle NULL explicitly:
NULL also affects aggregate functions differently:
COUNT(*) counts all rows regardless of NULL values. COUNT(country) skips rows where country is NULL. COUNT(DISTINCT country) counts unique non-NULL values. If the users table has 1000 rows, 50 with NULL country, and 30 distinct countries, this returns 1000, 950, 30.
Sorting also has NULL behavior worth knowing. By default, PostgreSQL sorts NULLs last in ascending order and first in descending order. You can override this:
| Operation | NULL Behavior |
|---|---|
WHERE x = NULL | Always produces NULL (use IS NULL) |
WHERE x != value | Excludes NULLs |
COUNT(*) | Counts all rows, including NULLs |
COUNT(column) | Skips NULL values |
SUM, AVG | Ignore NULL values |
ORDER BY ... ASC | NULLs last (PostgreSQL default) |
NULL + 5 | Returns NULL |
NULL = NULL | Returns NULL, not TRUE |
A single table can only take you so far.
Imagine if StreamFlow stored everything in one massive table, users, tracks, playlists, streams, payments, all mixed together. The same user data would be repeated across thousands of rows. If a user updates their email, you’d have to change it everywhere. Miss even one row, and your data is now inconsistent.
Relational databases solve this problem by splitting data into multiple tables and connecting them through shared columns.
That’s what “relational” really means. It’s not just about storing data in tables, it’s about defining clear relationships between those tables so the data stays consistent, maintainable, and easy to work with.
This is the most common relationship type. One row in the parent table connects to many rows in the child table, but each child row belongs to exactly one parent.
In StreamFlow, a user can create many playlists, but each playlist has exactly one creator:
The creator_user_id column in playlists points back to user_id in users. This column is called a foreign key, it references a row in another table. We'll cover foreign key mechanics in the next chapter, but the concept matters here: the foreign key is what creates the relationship.
Other one-to-many relationships in StreamFlow:
| Parent | Child | Meaning |
|---|---|---|
users | playlists | A user creates many playlists |
users | streams | A user generates many stream events |
users | payments | A user makes many payments |
artists | albums | An artist releases many albums |
albums | tracks | An album contains many tracks |
genres | albums | A genre categorizes many albums |
campaigns | ad_impressions | A campaign generates many impressions |
Sometimes both sides of a relationship can have multiple connections. A playlist can contain many tracks, and a single track can appear in many playlists. Neither side "owns" the other.
Relational databases can't represent many-to-many relationships directly between two tables. Instead, they use a third table, often called a junction table (also known as a bridge table or associative table), that sits between the two and holds pairs of references.
In StreamFlow, playlist_tracks is the junction table between playlists and tracks:
The junction table breaks one many-to-many relationship into two one-to-many relationships. Each row in playlist_tracks represents one track in one playlist:
Junction tables often carry their own data beyond the two foreign keys. playlist_tracks has position (the track's order in the playlist) and added_at (when the track was added). The follows table works the same way, it's a junction table between users and users, with followed_at as its own attribute.
Other many-to-many relationships in StreamFlow:
| Table A | Junction Table | Table B | Extra Columns |
|---|---|---|---|
playlists | playlist_tracks | tracks | position, added_at |
users | follows | users | followed_at |
users | artist_follows | artists | followed_at |
A one-to-one relationship means each row in table A connects to at most one row in table B. This is the rarest relationship type because if two things truly have a 1:1 mapping, you can usually just put them in the same table.
One-to-one relationships show up when you want to separate a table for practical reasons: different access patterns, different security requirements, or optional data that applies to only a subset of rows. StreamFlow doesn't have an explicit one-to-one table, but if the platform added a user_profiles table with bio, avatar URL, and preferences, that would be one-to-one with users, each user has at most one profile.
A table can have a relationship with itself. StreamFlow has two examples of this.
The genres table has a parent_genre_id column that references genre_id in the same table, creating a hierarchy:
This lets StreamFlow represent structures like Rock -> Alternative Rock -> Indie Rock without needing separate tables for each level.
The artists table uses the same pattern with manager_artist_id referencing artist_id, representing label or management hierarchies. And playlists has forked_from_playlist_id referencing playlist_id, tracking when a user forks another user's playlist.
Here's a subset of the StreamFlow schema showing how key tables connect:
Solid arrows represent one-to-many relationships (the arrow points from the "one" side to the "many" side). The dotted arrow on genres shows the self-referencing hierarchy. The junction tables (playlist_tracks, streams) sit at the intersection of multiple relationships.