1f29f1ba58
### What changes were proposed in this pull request? With SPARK-18107, we will disable the underlying replace(overwrite) and instead do delete in spark side and only do copy in hive side to bypass the performance issue - [HIVE-11940](https://issues.apache.org/jira/browse/HIVE-11940) Conditionally, if the table location and partition location do not belong to the same `FileSystem`, We should not disable hive overwrite. Otherwise, hive will use the `FileSystem` instance belong to the table location to copy files, which will fail in `FileSystem#checkPath` https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1657 In this PR, for Hive 2.0.0 and onwards, as [HIVE-11940](https://issues.apache.org/jira/browse/HIVE-11940) has been fixed, and there is no performance issue anymore. We should leave the overwrite logic to hive to avoid failure in `FileSystem#checkPath` **NOTE THAT** For Hive 2.2.0 and earlier, if the table and partition locations do not belong together, we will still get the same error thrown by hive encryption check due to [HIVE-14380]( https://issues.apache.org/jira/browse/HIVE-14380) which need to fix in another ticket SPARK-31675. ### Why are the changes needed? bugfix. a logic table can be decoupled with the storage layer and may contain data from remote storage systems. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Currently verified manually. add benchmark tests ```sql -INSERT INTO DYNAMIC 7742 7918 248 0.0 756044.0 1.0X -INSERT INTO HYBRID 1289 1307 26 0.0 125866.3 6.0X -INSERT INTO STATIC 371 393 38 0.0 36219.4 20.9X -INSERT OVERWRITE DYNAMIC 8456 8554 138 0.0 825790.3 0.9X -INSERT OVERWRITE HYBRID 1303 1311 12 0.0 127198.4 5.9X -INSERT OVERWRITE STATIC 434 447 13 0.0 42373.8 17.8X +INSERT INTO DYNAMIC 7382 7456 105 0.0 720904.8 1.0X +INSERT INTO HYBRID 1128 1129 1 0.0 110169.4 6.5X +INSERT INTO STATIC 349 370 39 0.0 34095.4 21.1X +INSERT OVERWRITE DYNAMIC 8149 8362 301 0.0 795821.8 0.9X +INSERT OVERWRITE HYBRID 1317 1318 2 0.0 128616.7 5.6X +INSERT OVERWRITE STATIC 387 408 37 0.0 37804.1 19.1X ``` + for master - for this PR both using hive 2.3.7 Closes #28511 from yaooqinn/SPARK-31684. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> |
||
---|---|---|
.. | ||
InsertIntoHiveTableBenchmark-hive1.2-results.txt | ||
InsertIntoHiveTableBenchmark-hive2.3-results.txt | ||
InsertIntoHiveTableBenchmark-jdk11-hive2.3-results.txt | ||
ObjectHashAggregateExecBenchmark-jdk11-results.txt | ||
ObjectHashAggregateExecBenchmark-results.txt | ||
OrcReadBenchmark-jdk11-results.txt | ||
OrcReadBenchmark-results.txt |