spark-instrumented-optimizer

History

Wenchen Fan 4494cd9716 [SPARK-18243][SQL] Port Hive writing to use FileFormat interface ## What changes were proposed in this pull request? Inserting data into Hive tables has its own implementation that is distinct from data sources: `InsertIntoHiveTable`, `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`. Note that one other major difference is that data source tables write directly to the final destination without using some staging directory, and then Spark itself adds the partitions/tables to the catalog. Hive tables actually write to some staging directory, and then call Hive metastore's loadPartition/loadTable function to load those data in. So we still need to keep `InsertIntoHiveTable` to put this special logic. In the future, we should think of writing to the hive table location directly, so that we don't need to call `loadTable`/`loadPartition` at the end and remove `InsertIntoHiveTable`. This PR removes `SparkHiveWriterContainer` and `SparkHiveDynamicPartitionWriterContainer`, and create a `HiveFileFormat` to implement the write logic. In the future, we should also implement the read logic in `HiveFileFormat`. ## How was this patch tested? existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes #16517 from cloud-fan/insert-hive.		2017-01-17 23:37:59 -08:00
..
java/org/apache/spark/sql/hive	[SPARK-16736][CORE][SQL] purge superfluous fs calls	2016-08-17 11:43:01 -07:00
resources	[SPARK-19219][SQL] Fix Parquet log output defaults	2017-01-17 12:14:38 +00:00
scala/org/apache/spark/sql	[SPARK-18243][SQL] Port Hive writing to use FileFormat interface	2017-01-17 23:37:59 -08:00