spark-instrumented-optimizer

History

Cheng Lian 7ff16e8abe [SPARK-7567] [SQL] Migrating Parquet data source to FSBasedRelation This PR migrates Parquet data source to the newly introduced `FSBasedRelation`. `FSBasedParquetRelation` is created to replace `ParquetRelation2`. Major differences are: 1. Partition discovery code has been factored out to `FSBasedRelation` 1. `AppendingParquetOutputFormat` is not used now. Instead, an anonymous subclass of `ParquetOutputFormat` is used to handle appending and writing dynamic partitions 1. When scanning partitioned tables, `FSBasedParquetRelation.buildScan` only builds an `RDD[Row]` for a single selected partition 1. `FSBasedParquetRelation` doesn't rely on Catalyst expressions for filter push down, thus it doesn't extend `CatalystScan` anymore After migrating `JSONRelation` (which extends `CatalystScan`), we can remove `CatalystScan`. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/6090) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #6090 from liancheng/parquet-migration and squashes the following commits: 6063f87 [Cheng Lian] Casts to OutputCommitter rather than FileOutputCommtter bfd1cf0 [Cheng Lian] Fixes compilation error introduced while rebasing f9ea56e [Cheng Lian] Adds ParquetRelation2 related classes to MiMa check whitelist 261d8c1 [Cheng Lian] Minor bug fix and more tests db65660 [Cheng Lian] Migrates Parquet data source to FSBasedRelation		2015-05-13 11:04:10 -07:00
..
java/org/apache/spark/sql/hive	[SPARK-7145] [CORE] commons-lang (2.x) classes used instead of commons-lang3 (3.x); commons-io used without dependency	2015-04-27 19:50:55 -04:00
resources	[SPARK-6734] [SQL] Add UDTF.close support in Generate	2015-05-14 00:14:59 +08:00
scala/org/apache/spark/sql	[SPARK-7567] [SQL] Migrating Parquet data source to FSBasedRelation	2015-05-13 11:04:10 -07:00