Last Updated: January 12, 2026
If you have ever logged into a website, placed an order online, or checked your bank balance, you have interacted with a relational database.
These systems have been the backbone of data storage for over four decades, and despite the rise of newer database types, they remain the default choice for most applications.
What makes relational databases so enduring?
It comes down to a set of guarantees that no other database type matches as comprehensively.
Your bank cannot afford to lose a transaction. An e-commerce site cannot sell the same inventory item twice. An airline cannot overbook a seat and discover it only when passengers board. Relational databases were designed from the ground up to prevent exactly these kinds of problems.
But relational databases are not just about safety. They are also about expressiveness. The query language they use, SQL, lets you ask complex questions about your data without worrying about how to compute the answer.
In 1970, Edgar F. Codd, a researcher at IBM, published a paper that would change computing forever. He proposed organizing data into "relations," which we now call tables. Each table has rows (records) and columns (attributes), and relationships between tables are expressed through shared values rather than physical pointers.
This might seem obvious today, but it was revolutionary. Before Codd, databases stored data in hierarchical or network structures where you had to know the physical layout to retrieve anything.
Codd's insight was that you could separate the logical structure of data from its physical storage, allowing the database system to optimize storage and retrieval independently.
A relational database organizes data into tables. Each table represents a type of entity: users, orders, products, payments.
Each row in a table represents one instance of that entity. Each column represents an attribute of that entity.
Several important properties emerge from this structure:
id column always contains integers, email always contains strings. The database enforces this, rejecting inserts that violate the schema.Every table needs a way to uniquely identify each row. This is the primary key. In the Users table above, the id column serves this purpose. No two rows can have the same primary key value, and the value cannot be NULL.
Primary keys can be:
Most modern applications use auto-incrementing integers for simplicity or UUIDs when distributed generation is needed.
The power of relational databases comes from connecting tables through relationships. A foreign key is a column (or set of columns) in one table that references the primary key of another table.
| id | name | |
|---|---|---|
| 1 | Alice | alice@example.com |
| 2 | Bob | bob@example.com |
| id | user_id | total | status |
|---|---|---|---|
| 101 | 1 | $150.00 | shipped |
| 102 | 1 | $75.50 | pending |
| 103 | 2 | $200.00 | delivered |
orders.user_id → users.idThe user_id column in the Orders table is a foreign key referencing the id column in the Users table. This establishes a one-to-many relationship: one user can have many orders.
Foreign keys provide two critical benefits:
user_id = 999 and no such user exists, the database rejects the operation.Relationships between tables come in three flavors:
One-to-One: Each row in Table A relates to at most one row in Table B, and vice versa. Example: Users and UserProfiles, where profile details are stored separately.
One-to-Many: Each row in Table A can relate to many rows in Table B, but each row in Table B relates to only one row in Table A. Example: Users and Orders.
Many-to-Many: Rows in Table A can relate to multiple rows in Table B, and vice versa. This requires a junction table (also called a join table or bridge table).
The Enrollments table has no data of its own other than the two foreign keys. Its sole purpose is to represent which students are enrolled in which courses.
One of the defining characteristics of relational databases is the schema: a formal definition of the structure of your data. Before you can store any data, you must declare what tables exist, what columns they have, and what types those columns accept.
A schema is created using Data Definition Language (DDL):
This schema enforces several constraints automatically:
| Constraint | Purpose | Violation Result |
|---|---|---|
NOT NULL | Column must have a value | Insert/update rejected |
UNIQUE | No duplicate values in column | Insert/update rejected |
PRIMARY KEY | Unique identifier for row | Insert rejected |
REFERENCES (FK) | Value must exist in referenced table | Insert/update rejected |
CHECK | Custom validation rule | Insert/update rejected |
DEFAULT | Value used when not specified | Default applied |
Relational databases use schema-on-write: data is validated against the schema when it is written. If the data does not conform, the write fails.
This is different from many NoSQL databases that use schema-on-read, where any data can be written and structure is interpreted when reading.
The schema-on-write approach means that every row in a relational database conforms to the declared structure. You never have to wonder "does this row have an email field?" because the schema guarantees it.
Normalization is the process of organizing data to minimize redundancy. The goal is to ensure that each piece of information is stored in exactly one place.
Consider a denormalized orders table:
Problems with this structure:
The normalized version separates customers, products, and orders into their own tables:
Now each piece of information lives in one place. Updating Alice's email is a single-row update. You can add customers without orders. You can delete orders without losing customer information.
Normalization has levels (1NF, 2NF, 3NF, BCNF, etc.), but for practical purposes, most applications aim for Third Normal Form (3NF), which eliminates most redundancy without overcomplicating the schema.
Transactions are the mechanism by which relational databases guarantee data consistency even in the face of failures.
ACID is an acronym for the four properties that transactions provide:
A transaction is all-or-nothing. Either all operations in the transaction complete successfully, or none of them do. There is no partial state.
Consider a bank transfer:
If the second UPDATE fails (maybe account 2 does not exist), the first UPDATE is rolled back. The $100 is never "lost" between accounts. Atomicity guarantees that you will never see a state where account 1 is debited but account 2 is not credited.
Transactions move the database from one valid state to another. All constraints (foreign keys, unique constraints, check constraints) are enforced. If any operation would violate a constraint, the entire transaction is rolled back.
Concurrent transactions do not interfere with each other. Each transaction sees the database as if it were the only one running. This prevents problems like:
Databases offer different isolation levels that trade off between correctness and performance:
| Isolation Level | Dirty Reads | Non-Repeatable Reads | Phantom Reads | Performance |
|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | Fastest |
| Read Committed | Prevented | Possible | Possible | Fast |
| Repeatable Read | Prevented | Prevented | Possible | Moderate |
| Serializable | Prevented | Prevented | Prevented | Slowest |
Most databases default to Read Committed, which prevents the most egregious problems while maintaining good performance.
Once a transaction is committed, it stays committed. Even if the server crashes immediately after, the data will be there when the system restarts.
Databases achieve this through write-ahead logging (WAL), where changes are written to a durable log before being applied to the main data files.
The key insight is that the WAL write happens before acknowledging the transaction. If the system crashes after ACK but before applying to data files, the WAL is replayed on recovery to complete the operation.
SQL (Structured Query Language) is the standard language for interacting with relational databases. What makes SQL powerful is that it is declarative: you describe what data you want, not how to get it.
Consider finding all orders over $100 for users in California:
Imperative (pseudocode):
Declarative (SQL):
The SQL version describes the result set without specifying how to compute it. The database's query optimizer decides whether to:
This separation of "what" from "how" is powerful because the optimizer can make different decisions based on table sizes, available indexes, and data distribution.
Aggregations compute values across multiple rows:
Joins combine data from multiple tables:
Subqueries embed one query inside another:
Window functions compute values across related rows without collapsing them:
Behind every SQL query is a query optimizer that determines the execution plan. You can examine this plan using EXPLAIN:
A good execution plan uses indexes efficiently and avoids full table scans. Understanding how to read execution plans and create appropriate indexes is a key skill for database performance.
Relational databases were designed in an era when all data fit on a single machine. As applications grew, scaling became a challenge. There are several strategies, each with trade-offs.
The simplest approach: buy a bigger server. More CPU, more RAM, faster disks.
| Metric | Small Server | Large Server | Limit |
|---|---|---|---|
| CPU Cores | 4 | 128 | ~256 |
| RAM | 16 GB | 4 TB | ~12 TB |
| Storage IOPS | 3,000 | 400,000 | ~1M |
| Cost/month | $200 | $50,000 | $ |
Vertical scaling is attractive because it requires no application changes. Your queries work exactly the same. But there are limits: eventually, you hit the ceiling of what a single machine can do, and costs grow non-linearly.
For read-heavy workloads, you can replicate data to multiple servers. Writes go to the primary, and reads can be served by any replica.
Considerations:
Read replicas work well for applications with high read-to-write ratios (90%+ reads). A typical e-commerce site fits this pattern: many people browse products, few actually buy.
Sharding splits data across multiple database servers, each holding a subset. Unlike read replicas where each server has all the data, shards have different data.
Sharding provides true horizontal scale for both reads and writes. But it comes with significant costs:
The sharding key choice is critical. A good sharding key:
Each relational database has strengths that make it suitable for different use cases:
PostgreSQL is often called "the most advanced open-source relational database." It combines rock-solid reliability with extensive features.
When to choose: General-purpose workloads, complex queries, when you need advanced SQL features or JSON support.
MySQL is the most widely deployed open-source database, particularly popular for web applications.
When to choose: Web applications, read-heavy workloads, when operational simplicity matters.
Microsoft's flagship database, tightly integrated with the Windows and .NET ecosystem.
When to choose: Microsoft-centric environments, enterprise applications with compliance requirements.
| Feature | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| License | Open source | Open source (GPL) | Commercial |
| JSON support | Excellent (JSONB) | Good (JSON) | Good |
| Full-text search | Built-in | Built-in | Built-in |
| Geospatial | PostGIS extension | Basic | Excellent |
| Replication | Streaming, logical | Multiple topologies | Always On |
| Max DB size | Unlimited | Unlimited | 524 PB |
| Common use | General purpose | Web apps | Enterprise |
Relational databases are the right choice when:
Relational databases may not be the best fit when:
Relational databases have earned their place as the default choice for data storage through a combination of powerful features:
| Feature | Benefit |
|---|---|
| Relational model | Natural representation of entities and relationships |
| Schema enforcement | Data quality guaranteed at the database level |
| ACID transactions | Reliable operations even with concurrent access and failures |
| SQL | Declarative queries with optimizer-driven execution |
| Mature ecosystem | Decades of tooling, knowledge, and optimization techniques |
Relational databases are not going away. They remain the foundation of most applications and the starting point for any new project. Understanding them deeply is essential because every other database type is, in some way, a response to their limitations.
The next chapter explores document databases, which emerged as an alternative when the relational model's schema rigidity and join-heavy nature became obstacles to developer velocity and horizontal scaling.