spark-instrumented-optimizer

History

Peter Toth b0cee9605e [SPARK-25062][SQL] Clean up BlockLocations in InMemoryFileIndex ## What changes were proposed in this pull request? `InMemoryFileIndex` contains a cache of `LocatedFileStatus` objects. Each `LocatedFileStatus` object can contain several `BlockLocation`s or some subclass of it. Filling up this cache by listing files happens recursively either on the driver or on the executors, depending on the parallel discovery threshold (`spark.sql.sources.parallelPartitionDiscovery.threshold`). If the listing happens on the executors block location objects are converted to simple `BlockLocation` objects to ensure serialization requirements. If it happens on the driver then there is no conversion and depending on the file system a `BlockLocation` object can be a subclass like `HdfsBlockLocation` and consume more memory. This PR adds the conversion to the latter case and decreases memory consumption. ## How was this patch tested? Added unit test. Closes #22603 from peter-toth/SPARK-25062. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>		2018-10-06 14:50:03 -07:00
..
benchmarks	[SPARK-25488][SQL][TEST] Refactor MiscBenchmark to use main method	2018-10-06 08:47:43 -07:00
src	[SPARK-25062][SQL] Clean up BlockLocations in InMemoryFileIndex	2018-10-06 14:50:03 -07:00
pom.xml	[SPARK-25592] Setting version to 3.0.0-SNAPSHOT	2018-10-02 08:48:24 -07:00