The previous chapter covered DDL, the commands that define table structure. This chapter covers DML (Data Manipulation Language), the commands that add, change, and remove the actual data inside those tables.

INSERT, UPDATE, and DELETE are the three core DML operations, and UPSERT is a pattern that combines INSERT and UPDATE into a single atomic statement.

Sample Data

Insert below data to follow along:

SQL

INSERT

INSERT adds new rows to a table. The basic form lists the target columns and the values to fill them with:

SQL

Always list columns explicitly. You can write INSERT INTO artists VALUES (6, 'Midnight Echo', 'FR', FALSE, NULL), but that relies on column order matching exactly. If someone adds or reorders a column later, the statement breaks silently or inserts data into the wrong column. Explicit column lists make the intent clear and protect against schema changes.

Omitting Columns

Any column you omit from the INSERT gets its default value. Columns with a DEFAULT clause use that default. Columns without one get NULL (if they allow it).

SQL

This inserts artist_id using the SERIAL sequence, sets verified to FALSE (its DEFAULT), and sets manager_artist_id to NULL. If a column is NOT NULL and has no default, omitting it causes an error:

SQL

Multi-Row INSERT

Instead of running five separate INSERT statements, pass multiple value sets in a single statement:

SQL

This is faster than individual inserts because it sends one command to the database instead of three. The performance difference is small for a few rows, but significant when inserting hundreds or thousands. Each separate INSERT requires a round trip to the database, parsing, planning, and execution. A multi-row INSERT does all of that once.

INSERT ... SELECT

When the data you want to insert already exists in another table, use INSERT ... SELECT instead of fetching and re-inserting manually. This is common in ETL pipelines and data migrations.

Suppose you have a staging table with new tracks that need to move into the main tracks table:

SQL

Now insert them into tracks:

SQL

The SELECT can include WHERE clauses, JOINs, aggregations, anything a normal query supports. For example, inserting only tracks from albums released in 2024:

SQL

RETURNING

PostgreSQL's RETURNING clause gives back the rows that were just inserted, which is useful when you need the auto-generated ID without making a second query:

SQL

You can return any column, not just the primary key. RETURNING works with expressions too:

SQL

Dialect Differences

Operation	PostgreSQL	MySQL	SQL Server
Auto-increment retrieval	`RETURNING id`	`LAST_INSERT_ID()`	`SCOPE_IDENTITY()` or `OUTPUT INSERTED.id`
Multi-row INSERT	Supported	Supported	Supported
INSERT ... SELECT	Supported	Supported	Supported
RETURNING clause	`RETURNING col1, col2`	Not supported	`OUTPUT INSERTED.col1`
Default keyword in VALUES	`DEFAULT`	`DEFAULT`	`DEFAULT`

UPDATE

UPDATE modifies existing rows. Every UPDATE needs a WHERE clause to target specific rows, unless you intentionally want to change every row in the table.

SQL

This sets verified to TRUE only for the artist named DJ Pulse. You can update multiple columns in a single statement:

SQL

What goes wrong here?

SQL

No WHERE clause. This sets verified to FALSE for every artist in the table. The database will not warn you. There is no confirmation prompt, no "are you sure?" dialog. The statement runs immediately, and the damage is done. In production, accidental blanket UPDATEs are one of the most common ways to corrupt data.

A good habit: write the WHERE clause first, then fill in the SET clause. Some teams even require wrapping UPDATE statements in a transaction during manual operations so the change can be reviewed before committing.

UPDATE with Subquery

You can use a subquery in the SET clause to compute a value from other tables. This updates each user's subscription_tier to match their most recent subscription:

SQL

The subquery runs once per row in users. It finds the most recent subscription for each user and uses that plan as the new tier. If a user has no subscriptions, the subquery returns NULL, so subscription_tier becomes NULL for that user. To avoid that, add a WHERE clause to the outer UPDATE:

SQL

Now only users who have at least one subscription record get updated.

UPDATE with JOIN

PostgreSQL supports UPDATE ... FROM, which lets you join another table directly instead of using a correlated subquery. This is often cleaner and faster:

SQL

The FROM clause works like a regular JOIN. DISTINCT ON (user_id) picks the most recent subscription per user (the row with the latest start_date after ordering). This approach avoids the correlated subquery, which means the database can use a more efficient join strategy instead of running a separate subquery for each user row.

MySQL uses a different syntax for the same idea:

SQL

RETURNING with UPDATE

Just like INSERT, PostgreSQL's RETURNING clause works with UPDATE:

SQL

This returns every row that was actually modified. It is useful for logging which rows changed, or for passing the updated data to the next step in an application without a separate SELECT.

Dialect Differences

Operation	PostgreSQL	MySQL	SQL Server
UPDATE with JOIN	`UPDATE t SET ... FROM other WHERE t.id = other.id`	`UPDATE t JOIN other ON ... SET ...`	`UPDATE t SET ... FROM t JOIN other ON ...`
RETURNING	`RETURNING col1, col2`	Not supported	`OUTPUT INSERTED.col1`
LIMIT on UPDATE	Not supported (use subquery)	`UPDATE ... LIMIT n`	`UPDATE TOP(n) ...`
Column aliasing in SET	Supported	Supported	Supported

DELETE

DELETE removes rows from a table. Like UPDATE, it almost always needs a WHERE clause.

SQL

This removes one specific stream record. You can delete based on any condition:

SQL

This removes all streams that started before November 2, 2024.

What goes wrong here?

SQL

This deletes every row in streams. The table still exists (that would require DROP TABLE), but it is now empty. Like UPDATE without WHERE, the database executes this immediately without asking for confirmation.

DELETE with Subquery

Delete rows based on data in another table using a subquery. This removes all streams from unverified artists:

SQL

The subquery finds all track IDs belonging to unverified artists, and the outer DELETE removes any streams of those tracks.

You can also use EXISTS for the same logic:

SQL

EXISTS is often preferred over IN when the subquery returns many rows, because the database can stop checking as soon as it finds the first match.

DELETE with JOIN

PostgreSQL supports DELETE ... USING, which works like UPDATE ... FROM:

SQL

This is the same logic as the subquery version, but expressed as a join. The USING clause introduces additional tables, and the WHERE clause ties them to the target table.

MySQL uses a different syntax:

SQL

TRUNCATE vs DELETE

Both remove rows, but they work differently under the hood:

Aspect	DELETE	TRUNCATE
Scope	Can target specific rows with WHERE	Always removes all rows
Speed	Slower (row-by-row logging)	Faster (deallocates entire pages)
Transaction rollback	Fully rollable in all databases	Rollable in PostgreSQL, not in MySQL
Triggers	Fires row-level DELETE triggers	Does not fire row-level triggers
Identity/sequence reset	Does not reset auto-increment	Resets auto-increment to starting value
Foreign keys	Checked per row	Fails if other tables reference this table (unless CASCADE)
RETURNING	Supported in PostgreSQL	Not supported

Use TRUNCATE when you need to empty a table completely and want it done fast. Use DELETE when you need to remove specific rows, need trigger execution, or need to roll back the operation in MySQL.

SQL

TRUNCATE also supports CASCADE in PostgreSQL, which truncates all tables that have foreign key references to the target table:

SQL

This is useful for cleaning up test databases, but dangerous in production. Always verify what CASCADE will affect before running it.

Soft Delete

Sometimes you don't want to physically remove data. Regulations may require keeping records for audit purposes. Users might want to "undo" a deletion. Analytics might need historical data. In these cases, a soft delete marks a row as deleted without actually removing it.

The typical pattern adds a deleted_at timestamp column to the table:

SQL

Instead of DELETE, you UPDATE:

SQL

All queries that read "active" data need to filter out soft-deleted rows:

SQL

The following diagram shows how a soft delete works compared to a hard delete:

Soft deletes add complexity. Every SELECT needs the WHERE deleted_at IS NULL filter, and forgetting it means displaying "deleted" records. Some teams create a view (covered in the next chapter) that applies this filter automatically. Soft deletes also mean the table grows larger over time, since rows are never truly removed.

UPSERT

An upsert is an INSERT that gracefully handles the case where the row already exists. Instead of failing with a duplicate key error, it either skips the insert or updates the existing row.

The problem it solves: imagine you maintain a user_stats summary table that tracks each user's total stream count. When a new stream comes in, you want to either insert a new row for that user (if they have no entry yet) or update the existing count. Without upsert, you'd need to check if the row exists first, then decide whether to INSERT or UPDATE. That creates a race condition: two concurrent requests could both check, both find no row, and both try to INSERT, causing a duplicate key error.

SQL

PostgreSQL: ON CONFLICT

PostgreSQL uses INSERT ... ON CONFLICT, which is the most readable upsert syntax among the major databases:

SQL

ON CONFLICT (user_id) specifies the column (or constraint) that determines a conflict. If user_id 1 already exists, the DO UPDATE clause runs instead of inserting a new row. The EXCLUDED keyword refers to the values that were proposed in the INSERT. So EXCLUDED.last_streamed is the timestamp from the VALUES clause.

If you want to silently skip duplicates without updating anything, use DO NOTHING:

SQL

This is useful for idempotent imports where you want to insert only rows that don't exist yet.

ON CONFLICT also works with multi-row inserts:

SQL

Each row is handled independently. Rows that conflict get updated, rows that don't get inserted.

The following diagram shows the upsert decision flow:

MySQL: ON DUPLICATE KEY UPDATE

MySQL's equivalent uses different syntax but the same idea:

SQL

MySQL uses VALUES(column) instead of EXCLUDED.column to reference the proposed values. The trigger for the conflict check is any UNIQUE or PRIMARY KEY constraint, and you don't specify which one explicitly.

SQL Server and ANSI SQL: MERGE

The ANSI SQL standard defines MERGE for upsert operations. SQL Server implements it fully:

SQL

MERGE is more verbose, but also more flexible. It supports WHEN MATCHED, WHEN NOT MATCHED, and even WHEN NOT MATCHED BY SOURCE (for deleting rows in the target that have no corresponding source row). PostgreSQL does not support MERGE natively (though version 15+ added a basic form), so most PostgreSQL code uses ON CONFLICT instead.

Dialect Comparison

Feature	PostgreSQL	MySQL	SQL Server
Syntax	`ON CONFLICT DO UPDATE`	`ON DUPLICATE KEY UPDATE`	`MERGE`
Skip duplicates	`ON CONFLICT DO NOTHING`	`INSERT IGNORE`	`WHEN MATCHED THEN` (no-op)
Reference proposed values	`EXCLUDED.column`	`VALUES(column)`	`source.column`
Conflict target	Explicit: column or constraint	Implicit: any unique key	Explicit: ON condition
Multi-row	Supported	Supported	Supported (using derived table)

INSERT, UPDATE, DELETE, and UPSERT

Sample Data

INSERT

Omitting Columns

Multi-Row INSERT

INSERT ... SELECT

RETURNING

Dialect Differences

UPDATE

UPDATE with Subquery

UPDATE with JOIN

RETURNING with UPDATE

Dialect Differences

DELETE

DELETE with Subquery

DELETE with JOIN

TRUNCATE vs DELETE

Soft Delete

UPSERT

PostgreSQL: ON CONFLICT

MySQL: ON DUPLICATE KEY UPDATE

SQL Server and ANSI SQL: MERGE

Dialect Comparison

Get Premium