spark-instrumented-optimizer/sql/catalyst
Wenchen Fan d1b8bd7d11 [SPARK-34720][SQL] MERGE ... UPDATE/INSERT * should do by-name resolution
### What changes were proposed in this pull request?

In Spark, we have an extension in the MERGE syntax: INSERT/UPDATE *. This is not from ANSI standard or any other mainstream databases, so we need to define the behaviors by our own.

The behavior today is very weird: assume the source table has `n1` columns, target table has `n2` columns. We generate the assignments by taking the first `min(n1, n2)` columns from source & target tables and pairing them by ordinal.

This PR proposes a more reasonable behavior: take all the columns from target table as keys, and find the corresponding columns from source table by name as values.

### Why are the changes needed?

Fix the MEREG INSERT/UPDATE * to be more user-friendly and easy to do schema evolution.

### Does this PR introduce _any_ user-facing change?

Yes, but MERGE is only supported by very few data sources.

### How was this patch tested?

new tests

Closes #32192 from cloud-fan/merge.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-05-13 12:58:24 +00:00
..
benchmarks [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
src [SPARK-34720][SQL] MERGE ... UPDATE/INSERT * should do by-name resolution 2021-05-13 12:58:24 +00:00
pom.xml [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile 2021-01-15 14:06:50 -08:00