功能

MERGE INTO 语句用于根据源表或子查询中的值更新目标表中的记录。当源表包含目标表中的新行(待插入)、修改行(待更新)和删除行(待删除)时,可以使用此功能来同步目标表的数据。

语法

MERGE INTO target_table 
   USING source_table
   ON merge_condition
   { WHEN MATCHED [ AND matched_condition ] THEN matched_action |
    WHEN  MATCHED [ AND matched_condition ] THEN matched_action |
    --当WHEN MATCHED语句和WHEN NOT MATCHED 语句同时存在时,MERGE INTO语句中只能存在一个WHEN NOT MATCHED
    WHEN NOT MATCHED [ AND not_matched_by_source_condition ] THEN not_matched_by_source_action } 
-- 参数解释
matched_action ::=
    UPDATE SET <column_name> = <expr> [ , <column_name2> = <expr2> ... ] 
    | DELETE 
not_matched_action ::=
    INSERT [ ( <column_name> [ , ... ] ) ] VALUES ( <expr> [ , ... ] )

参数说明

  • target_table:指定目标表,可以使用别名。
  • source_table:指定要与目标表进行比较的表或子查询,可以使用别名。
  • merge_condition:定义如何将源表中的行与目标表中的行进行匹配,返回布尔类型的表达式。
  • WHEN MATCHED [ AND matched_condition ]:当源行根据 merge_condition 和可选的 matched_condition 与目标表行匹配时,执行 WHEN MATCHED 子句。
    • 如果对于匹配 merge_condition 的源行和目标行对,没有任何 WHEN MATCHED 条件的计算结果为 true,则目标行保持不变。
    • 如果存在多个 WHEN MATCHED 子句,则会按照它们的指定顺序对其进行求值。每个 WHEN MATCHED 子句都必须具有一个 matched_condition
  • WHEN NOT MATCHED [ AND not_matched_condition ]:当源行根据 merge_condition 和可选的 not_matched_condition 与目标表中的任何行都不匹配时,执行 WHEN NOT MATCHED 子句。
    • 只能存在一个WHEN NOT MATCHED语句
  • matched_action
    • DELETE:删除匹配的目标表行。
    • UPDATE:更新匹配的目标表行。使用 UPDATE SET column1 = source.column1 [, column2 = source.column2 ...]
  • not_matched_action
    • INSERT:使用源数据集中的相应列插入目标表,支持指定字段插入。对于未指定的目标列,则插入为 NULL

注意事项

当执行 MERGE INTO 语句可能产生不确定结果时,系统会报错。产生这种情况的原因有:

  1. 如果 ON 子句使得源表中的多于 1 行与目标表中的行匹配,SQL 标准要求引发错误。
  2. 如果 ON 子句使得源表中的多于 1 行与目标表中的行匹配,同时在 AND case_predicate 条件过滤后仍有多条记录。
  3. 当WHEN MATCHED语句和WHEN NOT MATCHED 语句同时存在时,MERGE INTO语句中只能存在一个WHEN NOT MATCHED
  4. 当存在多个 WHEN MATCHED 子句时,UPDATE 语句必须在 DELETE 语句之前。

确定性结果的情况:

  1. 如果 ON 子句使得源表中的 1 行与目标表中的行匹配。
  2. 如果 ON 子句使得源表中的多于 1 行与目标表中的行匹配,同时在 AND case_predicate 条件过滤后只有一条记录。

详细使用示例

案例1:当匹配条件满足时,删除目标表中的数据

场景:删除在源表中存在的所有用户记录(标记删除)

建表语句

-- 目标表:存储用户信息
CREATE TABLE users_target (
  id INT,
  name STRING,
  value INT,
  updated_at STRING,
  marked_for_deletion BOOLEAN
);


-- 源表:待同步的用户数据
CREATE TABLE users_source (
  id INT,
  name STRING,
  value INT,
  updated_at STRING,
  marked_for_deletion BOOLEAN
);

目标表插入的数据:

INSERT INTO users_target VALUES 
(1, 'Alice', 100, '2024-01-01', false),
(2, 'Bob', 200, '2024-01-02', false),
(3, 'Charlie', 300, '2024-01-03', false);
+----+---------+-------+------------+---------------------+
| id | name    | value | updated_at | marked_for_deletion |
+----+---------+-------+------------+---------------------+
| 1  | Alice   | 100   | 2024-01-01 | false               |
+----+---------+-------+------------+---------------------+
| 2  | Bob     | 200   | 2024-01-02 | false               |
+----+---------+-------+------------+---------------------+
| 3  | Charlie | 300   | 2024-01-03 | false               |
+----+---------+-------+------------+---------------------+

源表插入的数据:

INSERT INTO users_source VALUES 
(1, 'Alice', 100, '2024-01-01', false),
(2, 'Bob', 200, '2024-01-02', false);
+----+-------+-------+------------+---------------------+
| id | name  | value | updated_at | marked_for_deletion |
+----+-------+-------+------------+---------------------+
| 1  | Alice | 100   | 2024-01-01 | false               |
+----+-------+-------+------------+---------------------+
| 2  | Bob   | 200   | 2024-01-02 | false               |
+----+-------+-------+------------+---------------------+

MERGE SQL

MERGE INTO users_target t USING users_source s
ON t.id = s.id
WHEN MATCHED THEN DELETE;

执行结果

+----+---------+-------+------------+---------------------+
| id | name    | value | updated_at | marked_for_deletion |
+----+---------+-------+------------+---------------------+
| 3  | Charlie | 300   | 2024-01-03 | false               |
+----+---------+-------+------------+---------------------+

说明:源表中 id 为 1 和 2 的记录在目标表中找到匹配项,这两条记录被删除。id 为 3 的 Charlie 记录因为在源表中没有匹配项,保持不变。

案例2:基于时间戳的条件更新

场景:只在源表数据更新时间比目标表更新时间更新时,才更新目标表的值

建表语句

-- 目标表:产品信息
CREATE TABLE products_target (
  id INT,
  name STRING,
  col1 STRING,
  updated_at STRING
);


-- 源表:新的产品数据
CREATE TABLE products_source (
  id INT,
  name STRING,
  col1 STRING,
  updated_at STRING
);

初始目标表数据:

INSERT INTO products_target VALUES 
(1, 'Product A', 'value_old', '2024-01-01'),
(2, 'Product B', 'value_old', '2024-01-02'),
(3, 'Product C', 'value_old', '2024-01-03');
+----+------------+-----------+------------+
| id | name       | col1      | updated_at |
+----+------------+-----------+------------+
| 1  | Product A  | value_old | 2024-01-01 |
+----+------------+-----------+------------+
| 2  | Product B  | value_old | 2024-01-02 |
+----+------------+-----------+------------+
| 3  | Product C  | value_old | 2024-01-03 |
+----+------------+-----------+------------+

初始源表数据:

INSERT INTO products_source VALUES 
(1, 'Product A', 'value_new', '2024-01-10'),
(2, 'Product B', 'value_old', '2024-01-01'),
(4, 'Product D', 'value_new', '2024-01-04');
+----+------------+-----------+------------+
| id | name       | col1      | updated_at |
+----+------------+-----------+------------+
| 1  | Product A  | value_new | 2024-01-10 |
+----+------------+-----------+------------+
| 2  | Product B  | value_old | 2024-01-01 |
+----+------------+-----------+------------+
| 4  | Product D  | value_new | 2024-01-04 |
+----+------------+-----------+------------+

MERGE SQL

MERGE INTO products_target t USING products_source s
ON t.id = s.id
WHEN MATCHED AND t.updated_at < s.updated_at THEN UPDATE SET t.col1 = s.col1;

执行结果

+----+------------+-----------+------------+
| id | name       | col1      | updated_at |
+----+------------+-----------+------------+
| 1  | Product A  | value_new | 2024-01-01 |
+----+------------+-----------+------------+
| 2  | Product B  | value_old | 2024-01-02 |
+----+------------+-----------+------------+
| 3  | Product C  | value_old | 2024-01-03 |
+----+------------+-----------+------------+

说明

  • Product A (id=1):源表的更新时间 2024-01-10 > 目标表的 2024-01-01,条件满足,col1 被更新为 value_new
  • Product B (id=2):源表的更新时间 2024-01-01 ≤ 目标表的 2024-01-02,条件不满足,保持不变
  • Product C (id=3):源表中没有对应记录(id=3),不进行任何操作,保持不变
  • Product D (id=4):源表新增,由于不存在 ON 条件的 WHEN NOT MATCHED 子句,不插入

案例3:删除标记与条件更新的组合

场景:对用户数据进行操作,标记为删除的用户直接删除,其他匹配的用户更新其状态信息

建表语句

-- 目标表:用户信息
CREATE TABLE users_data_target (
  id INT,
  name STRING,
  marked_for_deletion BOOLEAN,
  updated_at STRING,
  value STRING
);


-- 源表:同步的用户数据
CREATE TABLE users_data_source (
  id INT,
  name STRING,
  marked_for_deletion BOOLEAN,
  updated_at STRING,
  value STRING
);

初始目标表数据:

INSERT INTO users_data_target VALUES 
(1, 'User A', false, '2024-01-01', 'active'),
(2, 'User B', true, '2024-01-02', 'inactive'),
(3, 'User C', false, '2024-01-03', 'active'),
(4, 'User D', true, '2024-01-04', 'inactive');
+----+--------+---------------------+------------+----------+
| id | name   | marked_for_deletion | updated_at | value    |
+----+--------+---------------------+------------+----------+
| 1  | User A | false               | 2024-01-01 | active   |
+----+--------+---------------------+------------+----------+
| 2  | User B | true                | 2024-01-02 | inactive |
+----+--------+---------------------+------------+----------+
| 3  | User C | false               | 2024-01-03 | active   |
+----+--------+---------------------+------------+----------+
| 4  | User D | true                | 2024-01-04 | inactive |
+----+--------+---------------------+------------+----------+

初始源表数据:

INSERT INTO users_data_source VALUES 
(1, 'User A', false, '2024-01-15', 'updated'),
(2, 'User B', true, '2024-01-20', 'to_delete'),
(3, 'User C', false, '2024-01-10', 'updated'),
(5, 'User E', false, '2024-01-05', 'new_user');
+----+--------+---------------------+------------+-----------+
| id | name   | marked_for_deletion | updated_at | value     |
+----+--------+---------------------+------------+-----------+
| 1  | User A | false               | 2024-01-15 | updated   |
+----+--------+---------------------+------------+-----------+
| 2  | User B | true                | 2024-01-20 | to_delete |
+----+--------+---------------------+------------+-----------+
| 3  | User C | false               | 2024-01-10 | updated   |
+----+--------+---------------------+------------+-----------+
| 5  | User E | false               | 2024-01-05 | new_user  |
+----+--------+---------------------+------------+-----------+

MERGE SQL

MERGE INTO users_data_target t USING users_data_source s
ON t.id = s.id
WHEN MATCHED AND t.marked_for_deletion = false THEN UPDATE SET t.updated_at = s.updated_at, t.value = s.value
WHEN MATCHED AND t.marked_for_deletion = true THEN DELETE;

执行结果

+----+--------+---------------------+------------+---------+
| id | name   | marked_for_deletion | updated_at | value   |
+----+--------+---------------------+------------+---------+
| 1  | User A | false               | 2024-01-15 | updated |
+----+--------+---------------------+------------+---------+
| 3  | User C | false               | 2024-01-10 | updated |
+----+--------+---------------------+------------+---------+
| 4  | User D | true                | 2024-01-04 | inactive|
+----+--------+---------------------+------------+---------+

说明

  • User A (id=1)marked_for_deletion=false,满足第一个 WHEN MATCHED 条件,更新 updated_atvalue
  • User B (id=2)marked_for_deletion=true,满足第二个 WHEN MATCHED 条件,被删除
  • User C (id=3)marked_for_deletion=false,满足第一个 WHEN MATCHED 条件,更新 updated_atvalue
  • User D (id=4):源表中没有对应记录,保持不变
  • User E (id=5):源表新增,但没有 WHEN NOT MATCHED 子句,不插入

案例4:插入与更新的完整同步

场景:完整的数据同步操作,既要更新匹配的记录,也要插入新的记录

建表语句

-- 目标表:数据记录
CREATE TABLE data_sync_target (
  id INT,
  col1 STRING,
  col2 STRING
);


-- 源表:新数据
CREATE TABLE data_sync_source (
  id INT,
  col1 STRING,
  col2 STRING
);

初始目标表数据:

INSERT INTO data_sync_target VALUES 
(1, 'data_a', 'value_a'),
(2, 'data_b', 'value_b'),
(3, 'data_c', 'value_c');
+----+--------+---------+
| id | col1   | col2    |
+----+--------+---------+
| 1  | data_a | value_a |
+----+--------+---------+
| 2  | data_b | value_b |
+----+--------+---------+
| 3  | data_c | value_c |
+----+--------+---------+

初始源表数据:

INSERT INTO data_sync_source VALUES 
(1, 'updated_a', 'updated_value_a'),
(2, 'updated_b', 'updated_value_b'),
(4, 'new_data', 'new_value'),
(5, 'new_data2', 'new_value2');
+----+----------+------------------+
| id | col1     | col2             |
+----+----------+------------------+
| 1  | updated_a| updated_value_a  |
+----+----------+------------------+
| 2  | updated_b| updated_value_b  |
+----+----------+------------------+
| 4  | new_data | new_value        |
+----+----------+------------------+
| 5  | new_data2| new_value2       |
+----+----------+------------------+

MERGE SQL

MERGE INTO data_sync_target t USING data_sync_source s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.col1 = s.col1, t.col2 = s.col2
WHEN NOT MATCHED THEN INSERT (id, col1, col2) VALUES (s.id, s.col1, s.col2);

执行结果

+----+----------+------------------+
| id | col1     | col2             |
+----+----------+------------------+
| 1  | updated_a| updated_value_a  |
+----+----------+------------------+
| 2  | updated_b| updated_value_b  |
+----+----------+------------------+
| 3  | data_c   | value_c          |
+----+----------+------------------+
| 4  | new_data | new_value        |
+----+----------+------------------+
| 5  | new_data2| new_value2       |
+----+----------+------------------+

说明

  • id=1, 2:源表中存在匹配记录,执行 UPDATE 操作,col1 和 col2 都被更新为源表的值
  • id=3:源表中没有对应记录,保持原值不变
  • id=4, 5:源表新增的记录,执行 INSERT 操作,插入到目标表中

联系我们
预约咨询
微信咨询
电话咨询