spark-instrumented-optimizer

History

Kent Yao 2da72593c1 [SPARK-32976][SQL] Support column list in INSERT statement ### What changes were proposed in this pull request? #### JIRA expectations ``` INSERT currently does not support named column lists. INSERT INTO <table> (col1, col2,…) VALUES( 'val1', 'val2', … ) Note, we assume the column list contains all the column names. Issue an exception if the list is not complete. The column order could be different from the column order defined in the table definition. ``` #### implemetations In this PR, we add a column list as an optional part to the `INSERT OVERWRITE/INTO` statements: ``` /** * {{{ * INSERT OVERWRITE TABLE tableIdentifier [partitionSpec [IF NOT EXISTS]]? [identifierList] ... * INSERT INTO [TABLE] tableIdentifier [partitionSpec] [identifierList] ... * }}} / ``` The column list represents all expected columns with an explicit order that you want to insert to the target table. Particularly, we assume the column list contains all the column names in the current implementation, it will fail when the list is incomplete. In Analyzer*, we add a code path to resolve the column list in the `ResolveOutputRelation` rule before it is transformed to v1 or v2 command. It will fail here if the list has any field that not belongs to the target table. Then, for v2 command, e.g. `AppendData`, we use the resolved column list and output of the target table to resolve the output of the source query `ResolveOutputRelation` rule. If the list has duplicated columns, we fail. If the list is not empty but the list size does not match the target table, we fail. If no other exceptions occur, we use the column list to map the output of the source query to the output of the target table. The column list will be set to Nil and it will not hit the rule again after it is resolved. for v1 command, those all happen in the `PreprocessTableInsertion` rule ### Why are the changes needed? new feature support ### Does this PR introduce _any_ user-facing change? yes, insert into/overwrite table support specify column list ### How was this patch tested? new tests Closes #29893 from yaooqinn/SPARK-32976. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2020-11-30 05:23:23 +00:00
..
benchmarks	[SPARK-33523][SQL][TEST][FOLLOWUP] Fix benchmark case name in SubExprEliminationBenchmark	2020-11-25 15:22:47 -08:00
src	[SPARK-32976][SQL] Support column list in INSERT statement	2020-11-30 05:23:23 +00:00
pom.xml	[SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier	2020-10-11 21:54:56 -07:00