MERGE
Reference and syntax for the MERGE command.
This feature is in public preview.
MERGE
allows users to perform multiple data modifications (INSERT
, UPDATE
, DELETE
) within a single transaction. MERGE
is useful for common database tasks like removing duplicates, syncing data between tables (upsert), and cleaning out old records.
MERGE
internally performs a join between a target table and a data source, and then for each matching or non-matching row, you can decide whether to add it, update it, or delete it from the target table.
All rows in the join result fall into one of three categories: MATCHED
, NOT MATCHED BY SOURCE
, or NOT MATCHED [BY TARGET]
, meaning the row was either found in both target and source datasets, only in the target dataset, or only in the source dataset. In branches for the MATCHED
category, columns from both target and source datasets will be available for use in expressions and conditions. Given a branch in one of the NOT MATCHED
categories, only columns from the dataset where the row was found will be available.
There can be multiple branches defined for rows within each category. However at most one branch’s action will be executed per row, similar to a CASE
statement. WHEN
clauses are evaluated in order within each category, and the chosen action for each row will come from whichever branch condition matches first (or from the first branch without any condition). If none of the conditions match, no action is performed for that row.
When DO NOTHING
is specified as the merge action, and the branch condition is met, the joined row is explicitly skipped. Since actions are evaluated in their specified order, DO NOTHING
can be handy to skip non-interesting source or target rows before more fine-grained handling.
Any aggregated indexes on the target table will be updated in the same transaction.
Insert On Conflict
Upsert functionality is also available directly via the INSERT
statement. The optional ON CONFLICT
clause provides a way to reconcile or merge new data with existing records by either skipping or overwriting rows that already exist. However, it is available only for a narrow subset of INSERT
statements. See limitations here.
Permissions
To have permission to run MERGE
, you must have the respective INSERT
, UPDATE
, or DELETE
permissions on the target table. You must also have SELECT
permissions on the data source.
Syntax
Parameters
Parameter | Description |
---|---|
<target_table_name> | The table to apply changes to. |
<target_alias> | Optional alias for the target table. |
<data_source> | A table, subquery, or table-valued function (e.g., read_parquet ). |
<source_alias> | Optional alias for the data source. Table-valued functions will default to their own name as their alias. |
<join_condition> | Condition to match source and target rows. |
<when_clause> | A branch to match rows against, with an optional condition and an action (UPDATE, DELETE, INSERT, DO NOTHING) to perform. Multiple clauses can repeat the same match category or same action. |
MATCHED ... [AND <condition>] | Extra conditions per when_clause are optional. To be considered met, they must evaluate to TRUE rather than FALSE or NULL. An unconditional when_clause must not precede a conditional one of the same category, as the latter would be unreachable. |
<settings> | List of query-specific settings overrides. |
Examples
Example 1: ETL Synchronization
When keeping a production table in sync with a staging table, it may be natural to INSERT
brand new entries, UPDATE
rows for which entries already existed, and DELETE
stale entries that are no longer in the source.
UPSERT (INSERT UPDATE)
INSERT IGNORE
If you would like to preserve pre-existing entries rather than overwriting them, replace the UPDATE
clause with a DO NOTHING
clause, or omit the WHEN MATCHED
branch entirely for the same final effect. This would mean INSERT
brand new entries and do nothing for rows for which entries already existed.
Example 2: Data Deduplication
The data source can be any subquery, not just a static table. In this example, the subquery is identifying DISTINCT
rows from the table raw_table
, in order to submit deduplicated results to the destination deduped_table
.
Further USING
examples with subqueries:
Example 3: Conditional Data Migration
You may want to define subsets of the source data to be treated differently during migration, depending on some subset of attributes.
Use the attributes to define if-then-else conditions, and then define different insertion behavior for each subset. In this example, entries with attr
in range (10, 100] should not be inserted. Using the DO NOTHING
clause makes this contract clear, even though the conditions can be rewritten to omit the DO NOTHING
branch by being more restrictive. Note also the ‘catch-all’ clause at the end which does not have a condition. This is generally not necessary - all clauses could have conditions, and not all rows must be caught by one of the branches.
Conditions can also be used to define subsets of the source data that have already been migrated, and are being revisited to update certain attributes.
Conditions for rows no longer found in the source may be useful for deleting entries only after they reach sufficient staleness. For example, marking rows older than 1 week as stale, but waiting for 30 days before actually deleting them.
Example 4: Merge with External Files (or any TVF)
The data source can be any Table Valued Function (TVF), not just a subquery or static table. This is particularly useful for merging with external files, though you can also reference an external table object as the source (e.g. USING my_external_table
). Note that without defining an alias, the data source can be referenced later in the query by using the name of the TVF (e.g. read_parquet
).
Limitations
WITH
clause is not supported, for specifying one or more subqueries (CTE) that can be referenced from inside theMERGE
query.