Almost every application that works with structured data, from music streaming apps to banking platforms, relies on SQL to communicate with its database.

SQL has been around since the 1970s. Despite decades of new technologies attempting to replace it, it continues to be the standard way to interact with relational databases.

In this chapter, we’ll build a clear mental model of what SQL is, different types of SQL commands, and what actually happens inside a database engine when a query runs.

Why SQL Still Runs the World

SQL first appeared in the 1970s. It predates Java, Python, JavaScript, Go, and every other language most engineers use today. And it has outlived every technology that tried to replace it.

In the early 2010s, NoSQL databases like MongoDB and Cassandra gained massive popularity. The pitch was compelling: ditch the rigid table structure, scale horizontally, move fast. Many teams did exactly that. But then something interesting happened.

Over time, many of those NoSQL systems started adding SQL-like interfaces. Apache Cassandra introduced CQL. MongoDB added an aggregation framework that looks a lot like SQL pipelines. Google's Spanner, built for planet-scale distributed storage, speaks SQL natively.

The industry kept circling back to SQL because it solves a fundamental problem well: giving humans a readable, standardized way to ask questions about data.

SQL crosses role boundaries in a way few technologies do. Backend engineers write it to power APIs. Data engineers use it to build ETL pipelines. Data scientists use SQL to query data for feature engineering. Analysts build dashboards with it. Even product managers use it for quick ad-hoc queries to answer business questions.

It shows up in interviews too. SQL is commonly asked in backend roles, full-stack interviews, data engineering positions, and even some frontend take-home projects that involve database schemas.

What SQL Actually Is

Before SQL existed, working with data was painful. Applications stored information in flat files, each with its own proprietary format. If you wanted to find all customers in a particular state, you'd write a program that opened the file, read it line by line, parsed the fields, checked the state column, and collected the matches.

If the file format changed, you'd rewrite the program. If the storage system changed, you rewrote it again. Application logic and data storage were tightly coupled, which made even simple tasks slow, fragile, and expensive to maintain.

In 1970, Edgar F. Codd, a researcher at IBM, published a paper that changed everything. His key insight was simple but powerful: separate the logical view of data from its physical storage.

Instead of forcing people to write programs for every query, users could describe the data they wanted using a high-level language, while the system figured out the most efficient way to retrieve it. That idea became the foundation of relational databases, and SQL became the language built to communicate with them.

SQL stands for Structured Query Language. It's a domain-specific language, not a general-purpose programming language. You wouldn’t build a web server with it or write a mobile app in SQL But for the one job it was designed to do, communicating with relational databases, it is remarkably effective.

Here’s what a SQL query looks like. Say we want to find all users who are based in the US:

SQL

Three simple lines. No loops, no file handling, no parsing logic. You describe the result you want, and the database returns it.

Declarative vs Imperative

Most programming languages are imperative. You write step-by-step instructions: open this file, loop through the rows, check a condition, add matching items to a list, then return the list. You control the entire process.

SQL works differently. It's declarative. You describe the result you want, and the database engine decides how to get it.

Here's a concrete comparison. Suppose we want to find all premium users in a music streaming service.

Imperative approach (Python):

Declarative approach (SQL):

SQL

Both produce the same result. But the SQL version doesn't tell the database how to find those users. Should it scan every row? Use an index on subscription_tier? Read from a cache? That decision belongs to the database engine, not you.

This matters because the database engine has information you don't. It knows how many rows the table has, whether an index exists on the column you're filtering, how the data is distributed across disk, and what's already cached in memory.

When you simply describe what you want, the engine can use all of that context to choose the fastest path. But when you dictate every step, you lock it into your approach, which is often slower.

Think of it like ordering at a restaurant. You say "I'll have the pasta." You don't walk into the kitchen and say "Boil 2 liters of water, wait 8 minutes, add 200 grams of penne, drain, heat olive oil in a pan...".

The chef knows how to make pasta better than you do. SQL works the same way: you order the result, the database engine is the chef.

This is possible because every SQL query passes through an optimizer before it runs. The optimizer evaluates different strategies for getting your result and picks the cheapest one.

The Four Types of SQL Commands

SQL isn't just SELECT statements. Every SQL command falls into one of four categories, and each serves a different purpose.

1. DDL (Data Definition Language)

DDL is used to create and modify the structure of your database. Think of it as the architect’s toolkit. It defines which tables exist, what columns they have, and how everything is organized.

SQL

Other common DDL commands include:

ALTER TABLE – modify an existing table structure
DROP TABLE – delete a table completely

2. DML (Data Manipulation Language)

DML is used to work with the actual data inside those tables. This is where most developers spend the majority of their time, and it is the most common focus in interviews.

SQL

Other common DML commands include:

UPDATE – modify existing rows
DELETE – remove rows

3. DCL (Data Control Language)

DCL manages permissions and access control. It decides who can view data, modify it, or create new objects.

SQL

4. TCL (Transaction Control Language)

TCL manages transactions, which are groups of operations treated as a single unit. Either everything succeeds, or nothing does.

SQL

If something fails between BEGIN and COMMIT, you can ROLLBACK to undo everything. This prevents your database from ending up in a half-updated state.

Here's how these four categories fit together:

DML is by far the most commonly used category. Most day-to-day database work involves querying and modifying data, which is why the bulk of this course focuses on DML operations.

SQL in the Modern Tech Stack

SQL is not limited to database administration. It shows up across almost every layer of modern software.

Web Applications

Most backend frameworks rely on SQL behind the scenes. Whether you’re using Django, Rails, Spring Boot, or Express, your code often ends up generating SQL queries. When performance issues appear, you usually end up inspecting or writing raw SQL to fix them.

Data Engineering

SQL is the backbone of most data pipelines. ETL workflows use it to move and transform data across systems. Modern data warehouses like Snowflake, BigQuery, and Amazon Redshift are built around SQL.

Even tools like Apache Spark and dbt lean heavily on SQL as their primary interface.

Machine Learning

Before model training begins, much of the real work happens in SQL. Feature engineering often means querying datasets, joining tables, computing aggregates, and validating data quality.

In practice, SQL shapes the data before it ever reaches Python or a machine learning framework.

Analytics and Business Intelligence

Dashboards, reports, and ad-hoc analysis all depend on SQL. Tools like Metabase, Looker, and Tableau generate SQL queries under the hood.

Analysts who can write SQL directly tend to move much faster than those relying only on visual interfaces.

How a SQL Query Gets Executed

When you run a SQL query, the database does not simply “find the data.” It processes your request through three main stages: parsing, optimization, and execution.

Understanding this flow gives you a much clearer mental model of how SQL works, and why the declarative approach is so powerful.

Let’s walk through what happens step by step with this query:

SQL

Stage 1: The Parser

The parser acts like a strict grammar checker.

It reads your SQL query and validates everything:

Is the syntax correct?
Does the users table exist?
Are name, email, and country valid columns?

If anything is off, the query fails immediately with an error. No data is touched.

If everything looks good, the parser converts your SQL into an internal structure, often called a parse tree. This is a structured representation of your query that the database can actually work with.

Stage 2: The Optimizer

This is where the real intelligence comes in, and it's the reason declarative SQL works so well. The optimizer takes the parse tree and figures out the most efficient way to execute your query.

It may consider multiple strategies:

Scan the entire users table and filter rows
Use an index on country to jump directly to matching rows
Use a different join order if multiple tables are involved

To make that decision, it uses internal statistics such as:

How many rows are in the table
How values are distributed across columns
Which indexes are available
How expensive each operation is likely to be

Then it chooses the execution plan with the lowest estimated cost.

On a large table, this choice can make a massive difference. Using an index can be thousands of times faster than scanning every row. And the optimizer can make that call because it understands the data better than your application code does.

That is the power of declarative SQL: you describe what you want, and the database figures out how to get it efficiently.

Stage 3: The Executor

Once the optimizer chooses the best plan, the executor takes over.

It actually runs the query:

Reads data from disk or memory
Applies filters (like country = 'US')
Selects the required columns
Builds the final result set

Finally, it sends the results back to your application.

You write a simple query. Behind the scenes, the database:

Validates the syntax
Evaluates multiple execution strategies
Chooses the most efficient one
Reads and processes the data
Returns the result

All of this happens in milliseconds. That’s the power of SQL. You focus on the question, and the database handles the complexity of execution.

What is SQL?