spark-instrumented-optimizer

History

Gengliang Wang 71170e74df [SPARK-26871][SQL] File Source V2: avoid creating unnecessary FileIndex in the write path ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/23383, the file source V2 framework is implemented. In the PR, `FileIndex` is created as a member of `FileTable`, so that we can implement partition pruning like `0f9fcabb4a` in the future(As data source V2 catalog is under development, partition pruning is removed from the PR) However, after write path of file source V2 is implemented, I find that a simple write will create an unnecessary `FileIndex`, which is required by `FileTable`. This is a sort of regression. And we can see there is a warning message when writing to ORC files ``` WARN InMemoryFileIndex: The directory file:/tmp/foo was not found. Was it deleted very recently? ``` This PR is to make `FileIndex` as a lazy value in `FileTable`, so that we can avoid creating unnecessary `FileIndex` in the write path. ## How was this patch tested? Existing unit test Closes #23774 from gengliangwang/moveFileIndexInV2. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2019-02-15 14:57:23 +08:00
..
benchmarks	[SPARK-26745][SQL] Revert count optimization in JSON datasource by SPARK-24959	2019-01-31 14:32:31 +08:00
src	[SPARK-26871][SQL] File Source V2: avoid creating unnecessary FileIndex in the write path	2019-02-15 14:57:23 +08:00
pom.xml	[SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0	2018-11-14 16:22:23 -08:00