spark-instrumented-optimizer/sql/hive
Liang-Chi Hsieh 5ad03607d1 [SPARK-25271][SQL] Hive ctas commands should use data source if it is convertible
## What changes were proposed in this pull request?

In Spark 2.3.0 and previous versions, Hive CTAS command will convert to use data source to write data into the table when the table is convertible. This behavior is controlled by the configs like HiveUtils.CONVERT_METASTORE_ORC and HiveUtils.CONVERT_METASTORE_PARQUET.

In 2.3.1, we drop this optimization by mistake in the PR [SPARK-22977](https://github.com/apache/spark/pull/20521/files#r217254430). Since that Hive CTAS command only uses Hive Serde to write data.

This patch adds this optimization back to Hive CTAS command. This patch adds OptimizedCreateHiveTableAsSelectCommand which uses data source to write data.

## How was this patch tested?

Added test.

Closes #22514 from viirya/SPARK-25271-2.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2018-12-20 10:47:24 +08:00
..
benchmarks [SPARK-25964][SQL][MINOR] Revise OrcReadBenchmark/DataSourceReadBenchmark case names and execution instructions 2018-11-08 10:08:14 -08:00
compatibility/src/test/scala/org/apache/spark/sql/hive/execution Revert [SPARK-19355][SPARK-25352] 2018-09-20 20:18:31 +08:00
src [SPARK-25271][SQL] Hive ctas commands should use data source if it is convertible 2018-12-20 10:47:24 +08:00
pom.xml [SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 2018-11-14 16:22:23 -08:00