Last Updated: May 3, 2026
The previous chapter covered DDL, the commands that define table structure. This chapter covers DML (Data Manipulation Language), the commands that add, change, and remove the actual data inside those tables.
INSERT, UPDATE, and DELETE are the three core DML operations, and UPSERT is a pattern that combines INSERT and UPDATE into a single atomic statement.
Insert below data to follow along:
INSERT adds new rows to a table. The basic form lists the target columns and the values to fill them with:
Always list columns explicitly. You can write INSERT INTO artists VALUES (6, 'Midnight Echo', 'FR', FALSE, NULL), but that relies on column order matching exactly. If someone adds or reorders a column later, the statement breaks silently or inserts data into the wrong column. Explicit column lists make the intent clear and protect against schema changes.
Any column you omit from the INSERT gets its default value. Columns with a DEFAULT clause use that default. Columns without one get NULL (if they allow it).
This inserts artist_id using the SERIAL sequence, sets verified to FALSE (its DEFAULT), and sets manager_artist_id to NULL. If a column is NOT NULL and has no default, omitting it causes an error:
Instead of running five separate INSERT statements, pass multiple value sets in a single statement:
This is faster than individual inserts because it sends one command to the database instead of three. The performance difference is small for a few rows, but significant when inserting hundreds or thousands. Each separate INSERT requires a round trip to the database, parsing, planning, and execution. A multi-row INSERT does all of that once.
When the data you want to insert already exists in another table, use INSERT ... SELECT instead of fetching and re-inserting manually. This is common in ETL pipelines and data migrations.
Suppose you have a staging table with new tracks that need to move into the main tracks table:
Now insert them into tracks:
The SELECT can include WHERE clauses, JOINs, aggregations, anything a normal query supports. For example, inserting only tracks from albums released in 2024:
PostgreSQL's RETURNING clause gives back the rows that were just inserted, which is useful when you need the auto-generated ID without making a second query:
You can return any column, not just the primary key. RETURNING works with expressions too:
| Operation | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Auto-increment retrieval | RETURNING id | LAST_INSERT_ID() | SCOPE_IDENTITY() or OUTPUT INSERTED.id |
| Multi-row INSERT | Supported | Supported | Supported |
| INSERT ... SELECT | Supported | Supported | Supported |
| RETURNING clause | RETURNING col1, col2 | Not supported | OUTPUT INSERTED.col1 |
| Default keyword in VALUES | DEFAULT | DEFAULT | DEFAULT |
UPDATE modifies existing rows. Every UPDATE needs a WHERE clause to target specific rows, unless you intentionally want to change every row in the table.
This sets verified to TRUE only for the artist named DJ Pulse. You can update multiple columns in a single statement:
What goes wrong here?
No WHERE clause. This sets verified to FALSE for every artist in the table. The database will not warn you. There is no confirmation prompt, no "are you sure?" dialog. The statement runs immediately, and the damage is done. In production, accidental blanket UPDATEs are one of the most common ways to corrupt data.
A good habit: write the WHERE clause first, then fill in the SET clause. Some teams even require wrapping UPDATE statements in a transaction during manual operations so the change can be reviewed before committing.
You can use a subquery in the SET clause to compute a value from other tables. This updates each user's subscription_tier to match their most recent subscription:
The subquery runs once per row in users. It finds the most recent subscription for each user and uses that plan as the new tier. If a user has no subscriptions, the subquery returns NULL, so subscription_tier becomes NULL for that user. To avoid that, add a WHERE clause to the outer UPDATE:
Now only users who have at least one subscription record get updated.
PostgreSQL supports UPDATE ... FROM, which lets you join another table directly instead of using a correlated subquery. This is often cleaner and faster:
The FROM clause works like a regular JOIN. DISTINCT ON (user_id) picks the most recent subscription per user (the row with the latest start_date after ordering). This approach avoids the correlated subquery, which means the database can use a more efficient join strategy instead of running a separate subquery for each user row.
MySQL uses a different syntax for the same idea:
Just like INSERT, PostgreSQL's RETURNING clause works with UPDATE:
This returns every row that was actually modified. It is useful for logging which rows changed, or for passing the updated data to the next step in an application without a separate SELECT.
| Operation | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| UPDATE with JOIN | UPDATE t SET ... FROM other WHERE t.id = other.id | UPDATE t JOIN other ON ... SET ... | UPDATE t SET ... FROM t JOIN other ON ... |
| RETURNING | RETURNING col1, col2 | Not supported | OUTPUT INSERTED.col1 |
| LIMIT on UPDATE | Not supported (use subquery) | UPDATE ... LIMIT n | UPDATE TOP(n) ... |
| Column aliasing in SET | Supported | Supported | Supported |
DELETE removes rows from a table. Like UPDATE, it almost always needs a WHERE clause.
This removes one specific stream record. You can delete based on any condition:
This removes all streams that started before November 2, 2024.
What goes wrong here?
This deletes every row in streams. The table still exists (that would require DROP TABLE), but it is now empty. Like UPDATE without WHERE, the database executes this immediately without asking for confirmation.
Delete rows based on data in another table using a subquery. This removes all streams from unverified artists:
The subquery finds all track IDs belonging to unverified artists, and the outer DELETE removes any streams of those tracks.
You can also use EXISTS for the same logic:
EXISTS is often preferred over IN when the subquery returns many rows, because the database can stop checking as soon as it finds the first match.
PostgreSQL supports DELETE ... USING, which works like UPDATE ... FROM:
This is the same logic as the subquery version, but expressed as a join. The USING clause introduces additional tables, and the WHERE clause ties them to the target table.
MySQL uses a different syntax:
Both remove rows, but they work differently under the hood:
| Aspect | DELETE | TRUNCATE |
|---|---|---|
| Scope | Can target specific rows with WHERE | Always removes all rows |
| Speed | Slower (row-by-row logging) | Faster (deallocates entire pages) |
| Transaction rollback | Fully rollable in all databases | Rollable in PostgreSQL, not in MySQL |
| Triggers | Fires row-level DELETE triggers | Does not fire row-level triggers |
| Identity/sequence reset | Does not reset auto-increment | Resets auto-increment to starting value |
| Foreign keys | Checked per row | Fails if other tables reference this table (unless CASCADE) |
| RETURNING | Supported in PostgreSQL | Not supported |
Use TRUNCATE when you need to empty a table completely and want it done fast. Use DELETE when you need to remove specific rows, need trigger execution, or need to roll back the operation in MySQL.
TRUNCATE also supports CASCADE in PostgreSQL, which truncates all tables that have foreign key references to the target table:
This is useful for cleaning up test databases, but dangerous in production. Always verify what CASCADE will affect before running it.
Sometimes you don't want to physically remove data. Regulations may require keeping records for audit purposes. Users might want to "undo" a deletion. Analytics might need historical data. In these cases, a soft delete marks a row as deleted without actually removing it.
The typical pattern adds a deleted_at timestamp column to the table:
Instead of DELETE, you UPDATE:
All queries that read "active" data need to filter out soft-deleted rows:
The following diagram shows how a soft delete works compared to a hard delete:
Soft deletes add complexity. Every SELECT needs the WHERE deleted_at IS NULL filter, and forgetting it means displaying "deleted" records. Some teams create a view (covered in the next chapter) that applies this filter automatically. Soft deletes also mean the table grows larger over time, since rows are never truly removed.
An upsert is an INSERT that gracefully handles the case where the row already exists. Instead of failing with a duplicate key error, it either skips the insert or updates the existing row.
The problem it solves: imagine you maintain a user_stats summary table that tracks each user's total stream count. When a new stream comes in, you want to either insert a new row for that user (if they have no entry yet) or update the existing count. Without upsert, you'd need to check if the row exists first, then decide whether to INSERT or UPDATE. That creates a race condition: two concurrent requests could both check, both find no row, and both try to INSERT, causing a duplicate key error.
PostgreSQL uses INSERT ... ON CONFLICT, which is the most readable upsert syntax among the major databases:
ON CONFLICT (user_id) specifies the column (or constraint) that determines a conflict. If user_id 1 already exists, the DO UPDATE clause runs instead of inserting a new row. The EXCLUDED keyword refers to the values that were proposed in the INSERT. So EXCLUDED.last_streamed is the timestamp from the VALUES clause.
If you want to silently skip duplicates without updating anything, use DO NOTHING:
This is useful for idempotent imports where you want to insert only rows that don't exist yet.
ON CONFLICT also works with multi-row inserts:
Each row is handled independently. Rows that conflict get updated, rows that don't get inserted.
The following diagram shows the upsert decision flow:
MySQL's equivalent uses different syntax but the same idea:
MySQL uses VALUES(column) instead of EXCLUDED.column to reference the proposed values. The trigger for the conflict check is any UNIQUE or PRIMARY KEY constraint, and you don't specify which one explicitly.
The ANSI SQL standard defines MERGE for upsert operations. SQL Server implements it fully:
MERGE is more verbose, but also more flexible. It supports WHEN MATCHED, WHEN NOT MATCHED, and even WHEN NOT MATCHED BY SOURCE (for deleting rows in the target that have no corresponding source row). PostgreSQL does not support MERGE natively (though version 15+ added a basic form), so most PostgreSQL code uses ON CONFLICT instead.
| Feature | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Syntax | ON CONFLICT DO UPDATE | ON DUPLICATE KEY UPDATE | MERGE |
| Skip duplicates | ON CONFLICT DO NOTHING | INSERT IGNORE | WHEN MATCHED THEN (no-op) |
| Reference proposed values | EXCLUDED.column | VALUES(column) | source.column |
| Conflict target | Explicit: column or constraint | Implicit: any unique key | Explicit: ON condition |
| Multi-row | Supported | Supported | Supported (using derived table) |