spark-instrumented-optimizer

History

shahid 485ae6d181 [SPARK-25474][SQL] Support `spark.sql.statistics.fallBackToHdfs` in data source tables In case of CatalogFileIndex datasource table, sizeInBytes is always coming as default size in bytes, which is 8.0EB (Even when the user give fallBackToHdfsForStatsEnabled=true) . So, the datasource table which has CatalogFileIndex, always prefer SortMergeJoin, instead of BroadcastJoin, even though the size is below broadcast join threshold. In this PR, In case of CatalogFileIndex table, if we enable "fallBackToHdfsForStatsEnabled=true", then the computeStatistics get the sizeInBytes from the hdfs and we get the actual size of the table. Hence, during join operation, when the table size is below broadcast threshold, it will prefer broadCastHashJoin instead of SortMergeJoin. Added UT Closes #22502 from shahidki31/SPARK-25474. Authored-by: shahid <shahidki31@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>		2019-07-28 15:35:37 -07:00
..
benchmarks	[SPARK-26584][SQL] Remove `spark.sql.orc.copyBatchToSpark` internal conf	2019-01-10 08:42:23 -08:00
compatibility/src/test/scala/org/apache/spark/sql/hive/execution	[SPARK-28460][SQL][TEST][test-hadoop3.2] Port test from HIVE-11835	2019-07-27 17:04:27 -07:00
src	[SPARK-25474][SQL] Support `spark.sql.statistics.fallBackToHdfs` in data source tables	2019-07-28 15:35:37 -07:00
pom.xml	[SPARK-27831][SQL][TEST] Move Hive test jars to maven dependency	2019-06-02 20:23:08 -07:00