spark-instrumented-optimizer

History

Cheng Lian bc3760d405 [SPARK-14237][SQL] De-duplicate partition value appending logic in various buildReader() implementations ## What changes were proposed in this pull request? Currently, various `FileFormat` data sources share approximately the same code for partition value appending. This PR tries to eliminate this duplication. A new method `buildReaderWithPartitionValues()` is added to `FileFormat` with a default implementation that appends partition values to `InternalRow`s produced by the reader function returned by `buildReader()`. Special data sources like Parquet, which implements partition value appending inside `buildReader()` because of the vectorized reader, and the Text data source, which doesn't support partitioning, override `buildReaderWithPartitionValues()` and simply delegate to `buildReader()`. This PR brings two benefits: 1. Apparently, it de-duplicates partition value appending logic 2. Now the reader function returned by `buildReader()` is only required to produce `InternalRow`s rather than `UnsafeRow`s if the data source doesn't override `buildReaderWithPartitionValues()`. Because the safe-to-unsafe conversion is also performed while appending partition values. This makes 3rd-party data sources (e.g. spark-avro) easier to implement since they no longer need to access private APIs involving `UnsafeRow`. ## How was this patch tested? Existing tests should do the work. Author: Cheng Lian <lian@databricks.com> Closes #12866 from liancheng/spark-14237-simplify-partition-values-appending.		2016-05-04 14:16:57 +08:00
..
java/org/apache/spark/sql	[SPARK-13745] [SQL] Support columnar in memory representation on Big Endian platforms	2016-05-02 13:16:46 -07:00
resources	[SPARK-12902] [SQL] visualization for generated operators	2016-01-25 12:44:20 -08:00
scala/org/apache/spark/sql	[SPARK-14237][SQL] De-duplicate partition value appending logic in various buildReader() implementations	2016-05-04 14:16:57 +08:00