Last Updated: May 1, 2026
Almost every application that works with structured data, from music streaming apps to banking platforms, relies on SQL to communicate with its database.
SQL has been around since the 1970s. Despite decades of new technologies attempting to replace it, it continues to be the standard way to interact with relational databases.
In this chapter, we’ll build a clear mental model of what SQL is, different types of SQL commands, and what actually happens inside a database engine when a query runs.
SQL first appeared in the 1970s. It predates Java, Python, JavaScript, Go, and every other language most engineers use today. And it has outlived every technology that tried to replace it.
In the early 2010s, NoSQL databases like MongoDB and Cassandra gained massive popularity. The pitch was compelling: ditch the rigid table structure, scale horizontally, move fast. Many teams did exactly that. But then something interesting happened.
Over time, many of those NoSQL systems started adding SQL-like interfaces. Apache Cassandra introduced CQL. MongoDB added an aggregation framework that looks a lot like SQL pipelines. Google's Spanner, built for planet-scale distributed storage, speaks SQL natively.
The industry kept circling back to SQL because it solves a fundamental problem well: giving humans a readable, standardized way to ask questions about data.
SQL crosses role boundaries in a way few technologies do. Backend engineers write it to power APIs. Data engineers use it to build ETL pipelines. Data scientists use SQL to query data for feature engineering. Analysts build dashboards with it. Even product managers use it for quick ad-hoc queries to answer business questions.
It shows up in interviews too. SQL is commonly asked in backend roles, full-stack interviews, data engineering positions, and even some frontend take-home projects that involve database schemas.
Before SQL existed, working with data was painful. Applications stored information in flat files, each with its own proprietary format. If you wanted to find all customers in a particular state, you'd write a program that opened the file, read it line by line, parsed the fields, checked the state column, and collected the matches.
If the file format changed, you'd rewrite the program. If the storage system changed, you rewrote it again. Application logic and data storage were tightly coupled, which made even simple tasks slow, fragile, and expensive to maintain.
In 1970, Edgar F. Codd, a researcher at IBM, published a paper that changed everything. His key insight was simple but powerful: separate the logical view of data from its physical storage.
Instead of forcing people to write programs for every query, users could describe the data they wanted using a high-level language, while the system figured out the most efficient way to retrieve it. That idea became the foundation of relational databases, and SQL became the language built to communicate with them.
SQL stands for Structured Query Language. It's a domain-specific language, not a general-purpose programming language. You wouldn’t build a web server with it or write a mobile app in SQL But for the one job it was designed to do, communicating with relational databases, it is remarkably effective.
Here’s what a SQL query looks like. Say we want to find all users who are based in the US:
Three simple lines. No loops, no file handling, no parsing logic. You describe the result you want, and the database returns it.
Most programming languages are imperative. You write step-by-step instructions: open this file, loop through the rows, check a condition, add matching items to a list, then return the list. You control the entire process.
SQL works differently. It's declarative. You describe the result you want, and the database engine decides how to get it.
Here's a concrete comparison. Suppose we want to find all premium users in a music streaming service.
Both produce the same result. But the SQL version doesn't tell the database how to find those users. Should it scan every row? Use an index on subscription_tier? Read from a cache? That decision belongs to the database engine, not you.
This matters because the database engine has information you don't. It knows how many rows the table has, whether an index exists on the column you're filtering, how the data is distributed across disk, and what's already cached in memory.
When you simply describe what you want, the engine can use all of that context to choose the fastest path. But when you dictate every step, you lock it into your approach, which is often slower.
Think of it like ordering at a restaurant. You say "I'll have the pasta." You don't walk into the kitchen and say "Boil 2 liters of water, wait 8 minutes, add 200 grams of penne, drain, heat olive oil in a pan...".
The chef knows how to make pasta better than you do. SQL works the same way: you order the result, the database engine is the chef.
This is possible because every SQL query passes through an optimizer before it runs. The optimizer evaluates different strategies for getting your result and picks the cheapest one.
SQL isn't just SELECT statements. Every SQL command falls into one of four categories, and each serves a different purpose.
DDL is used to create and modify the structure of your database. Think of it as the architect’s toolkit. It defines which tables exist, what columns they have, and how everything is organized.
Other common DDL commands include:
ALTER TABLE – modify an existing table structureDROP TABLE – delete a table completelyDML is used to work with the actual data inside those tables. This is where most developers spend the majority of their time, and it is the most common focus in interviews.
Other common DML commands include:
UPDATE – modify existing rowsDELETE – remove rowsDCL manages permissions and access control. It decides who can view data, modify it, or create new objects.
TCL manages transactions, which are groups of operations treated as a single unit. Either everything succeeds, or nothing does.
If something fails between BEGIN and COMMIT, you can ROLLBACK to undo everything. This prevents your database from ending up in a half-updated state.
Here's how these four categories fit together:
DML is by far the most commonly used category. Most day-to-day database work involves querying and modifying data, which is why the bulk of this course focuses on DML operations.
SQL is not limited to database administration. It shows up across almost every layer of modern software.
Most backend frameworks rely on SQL behind the scenes. Whether you’re using Django, Rails, Spring Boot, or Express, your code often ends up generating SQL queries. When performance issues appear, you usually end up inspecting or writing raw SQL to fix them.
SQL is the backbone of most data pipelines. ETL workflows use it to move and transform data across systems. Modern data warehouses like Snowflake, BigQuery, and Amazon Redshift are built around SQL.
Even tools like Apache Spark and dbt lean heavily on SQL as their primary interface.
Before model training begins, much of the real work happens in SQL. Feature engineering often means querying datasets, joining tables, computing aggregates, and validating data quality.
In practice, SQL shapes the data before it ever reaches Python or a machine learning framework.
Dashboards, reports, and ad-hoc analysis all depend on SQL. Tools like Metabase, Looker, and Tableau generate SQL queries under the hood.
Analysts who can write SQL directly tend to move much faster than those relying only on visual interfaces.
When you run a SQL query, the database does not simply “find the data.” It processes your request through three main stages: parsing, optimization, and execution.
Understanding this flow gives you a much clearer mental model of how SQL works, and why the declarative approach is so powerful.
Let’s walk through what happens step by step with this query:
The parser acts like a strict grammar checker.
It reads your SQL query and validates everything:
If anything is off, the query fails immediately with an error. No data is touched.
If everything looks good, the parser converts your SQL into an internal structure, often called a parse tree. This is a structured representation of your query that the database can actually work with.
This is where the real intelligence comes in, and it's the reason declarative SQL works so well. The optimizer takes the parse tree and figures out the most efficient way to execute your query.
It may consider multiple strategies:
To make that decision, it uses internal statistics such as:
Then it chooses the execution plan with the lowest estimated cost.
On a large table, this choice can make a massive difference. Using an index can be thousands of times faster than scanning every row. And the optimizer can make that call because it understands the data better than your application code does.
That is the power of declarative SQL: you describe what you want, and the database figures out how to get it efficiently.
Once the optimizer chooses the best plan, the executor takes over.
It actually runs the query:
country = 'US')Finally, it sends the results back to your application.
You write a simple query. Behind the scenes, the database:
All of this happens in milliseconds. That’s the power of SQL. You focus on the question, and the database handles the complexity of execution.