spark-instrumented-optimizer

History

jiake 5c67a9a7fa [SPARK-26316][SPARK-21052] Revert hash join metrics in that causes performance degradation ## What changes were proposed in this pull request? The wrong implementation in the hash join metrics in [spark 21052](https://issues.apache.org/jira/browse/SPARK-21052) caused significant performance degradation in TPC-DS. And the result is [here](https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0) in TPC-DS 1TB scale. So we currently partial revert 21052. Cluster info: \| Master Node \| Worker Nodes -- \| -- \| -- Node \| 1x \| 4x Processor \| Intel(R) Xeon(R) Platinum 8170 CPU 2.10GHz \| Intel(R) Xeon(R) Platinum 8180 CPU 2.50GHz Memory \| 192 GB \| 384 GB Storage Main \| 8 x 960G SSD \| 8 x 960G SSD Network \| 10Gbe \| Role \| CM Management NameNodeSecondary NameNodeResource ManagerHive Metastore Server \| DataNodeNodeManager OS Version \| CentOS 7.2 \| CentOS 7.2 Hadoop \| Apache Hadoop 2.7.5 \| Apache Hadoop 2.7.5 Hive \| Apache Hive 2.2.0 \| Spark \| Apache Spark 2.1.0 & Apache Spark2.3.0 \| JDK version \| 1.8.0_112 \| 1.8.0_112 Related parameters setting: Component \| Parameter \| Value -- \| -- \| -- Yarn Resource Manager \| yarn.scheduler.maximum-allocation-mb \| 120GB \| yarn.scheduler.minimum-allocation-mb \| 1GB \| yarn.scheduler.maximum-allocation-vcores \| 121 \| Yarn.resourcemanager.scheduler.class \| Fair Scheduler Yarn Node Manager \| yarn.nodemanager.resource.memory-mb \| 120GB \| yarn.nodemanager.resource.cpu-vcores \| 121 Spark \| spark.executor.memory \| 110GB \| spark.executor.cores \| 50 ## How was this patch tested? N/A Closes #23269 from JkSelf/partial-revert-21052. Authored-by: jiake <ke.a.jia@intel.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2018-12-11 21:23:27 +08:00
..
benchmarks	[SPARK-25964][SQL][MINOR] Revise OrcReadBenchmark/DataSourceReadBenchmark case names and execution instructions	2018-11-08 10:08:14 -08:00
src	[SPARK-26316][SPARK-21052] Revert hash join metrics in that causes performance degradation	2018-12-11 21:23:27 +08:00
pom.xml	[SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0	2018-11-14 16:22:23 -08:00