spark-instrumented-optimizer/sql/core/benchmarks/MiscBenchmark-results.txt

================================================================================================
filter & aggregate without group
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
range/filter/sum:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
range/filter/sum wholestage off             47752 / 48952         43.9          22.8       1.0X
range/filter/sum wholestage on                3123 / 3558        671.5           1.5      15.3X


================================================================================================
range/limit/sum
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
range/limit/sum:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
range/limit/sum wholestage off                 229 /  236       2288.9           0.4       1.0X
range/limit/sum wholestage on                  257 /  267       2041.0           0.5       0.9X


================================================================================================
sample
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
sample with replacement:                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sample with replacement wholestage off      12908 / 13076         10.2          98.5       1.0X
sample with replacement wholestage on         7334 / 7346         17.9          56.0       1.8X

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
sample without replacement:              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sample without replacement wholestage off      3082 / 3095         42.5          23.5       1.0X
sample without replacement wholestage on      1125 / 1211        116.5           8.6       2.7X


================================================================================================
collect
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
collect:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
collect 1 million                              291 /  311          3.6         277.3       1.0X
collect 2 millions                             552 /  564          1.9         526.6       0.5X
collect 4 millions                            1104 / 1108          0.9        1053.0       0.3X


================================================================================================
collect limit
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
collect limit:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
collect limit 1 million                        311 /  340          3.4         296.2       1.0X
collect limit 2 millions                       581 /  614          1.8         554.4       0.5X


================================================================================================
generate explode
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
generate explode array:                  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
generate explode array wholestage off       15211 / 15368          1.1         906.6       1.0X
generate explode array wholestage on        10761 / 10776          1.6         641.4       1.4X

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
generate explode map:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
generate explode map wholestage off         22128 / 22578          0.8        1318.9       1.0X
generate explode map wholestage on          16421 / 16520          1.0         978.8       1.3X

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
generate posexplode array:               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
generate posexplode array wholestage off    17108 / 18019          1.0        1019.7       1.0X
generate posexplode array wholestage on     11715 / 11804          1.4         698.3       1.5X

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
generate inline array:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
generate inline array wholestage off        16358 / 16418          1.0         975.0       1.0X
generate inline array wholestage on         11152 / 11472          1.5         664.7       1.5X

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
generate big struct array:               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
generate big struct array wholestage off       708 /  776          0.1       11803.5       1.0X
generate big struct array wholestage on        535 /  589          0.1        8913.9       1.3X

OpenJDK 64-Bit Server VM 1.8.0_212-b04 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
generate big nested struct array:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
generate big nested struct array wholestage off            540            553          19          0.1        8997.4       1.0X
generate big nested struct array wholestage on            523            554          31          0.1        8725.0       1.0X


================================================================================================
generate regular generator
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
generate stack:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
generate stack wholestage off               29082 / 29393          0.6        1733.4       1.0X
generate stack wholestage on                21066 / 21128          0.8        1255.6       1.4X
[SPARK-25488][SQL][TEST] Refactor MiscBenchmark to use main method ## What changes were proposed in this pull request? Refactor `MiscBenchmark ` to use main method. Generate benchmark result: ```sh SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.MiscBenchmark" ``` ## How was this patch tested? manual tests Closes #22500 from wangyum/SPARK-25488. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: Yuming Wang <wgyumg@gmail.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> 2018-10-06 11:47:43 -04:00			`================================================================================================`
			`filter & aggregate without group`
			`================================================================================================`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`range/filter/sum: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`range/filter/sum wholestage off 47752 / 48952 43.9 22.8 1.0X`
			`range/filter/sum wholestage on 3123 / 3558 671.5 1.5 15.3X`


			`================================================================================================`
			`range/limit/sum`
			`================================================================================================`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`range/limit/sum: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`range/limit/sum wholestage off 229 / 236 2288.9 0.4 1.0X`
			`range/limit/sum wholestage on 257 / 267 2041.0 0.5 0.9X`


			`================================================================================================`
			`sample`
			`================================================================================================`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`sample with replacement: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`sample with replacement wholestage off 12908 / 13076 10.2 98.5 1.0X`
			`sample with replacement wholestage on 7334 / 7346 17.9 56.0 1.8X`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`sample without replacement: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`sample without replacement wholestage off 3082 / 3095 42.5 23.5 1.0X`
			`sample without replacement wholestage on 1125 / 1211 116.5 8.6 2.7X`


			`================================================================================================`
			`collect`
			`================================================================================================`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`collect: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`collect 1 million 291 / 311 3.6 277.3 1.0X`
			`collect 2 millions 552 / 564 1.9 526.6 0.5X`
			`collect 4 millions 1104 / 1108 0.9 1053.0 0.3X`


			`================================================================================================`
			`collect limit`
			`================================================================================================`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`collect limit: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`collect limit 1 million 311 / 340 3.4 296.2 1.0X`
			`collect limit 2 millions 581 / 614 1.8 554.4 0.5X`


			`================================================================================================`
			`generate explode`
			`================================================================================================`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`generate explode array: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`generate explode array wholestage off 15211 / 15368 1.1 906.6 1.0X`
			`generate explode array wholestage on 10761 / 10776 1.6 641.4 1.4X`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`generate explode map: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`generate explode map wholestage off 22128 / 22578 0.8 1318.9 1.0X`
			`generate explode map wholestage on 16421 / 16520 1.0 978.8 1.3X`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`generate posexplode array: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`generate posexplode array wholestage off 17108 / 18019 1.0 1019.7 1.0X`
			`generate posexplode array wholestage on 11715 / 11804 1.4 698.3 1.5X`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`generate inline array: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`generate inline array wholestage off 16358 / 16418 1.0 975.0 1.0X`
			`generate inline array wholestage on 11152 / 11472 1.5 664.7 1.5X`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`generate big struct array: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`generate big struct array wholestage off 708 / 776 0.1 11803.5 1.0X`
			`generate big struct array wholestage on 535 / 589 0.1 8913.9 1.3X`

[SPARK-27707][SQL] Prune unnecessary nested fields from Generate ## What changes were proposed in this pull request? Performance issue using explode was found when a complex field contains huge array is to get duplicated as the number of exploded array elements. Given example: ```scala val df = spark.sparkContext.parallelize(Seq(("1", Array.fill(M)({ val i = math.random (i.toString, (i + 1).toString, (i + 2).toString, (i + 3).toString) })))).toDF("col", "arr") .selectExpr("col", "struct(col, arr) as st") .selectExpr("col", "st.col as col1", "explode(st.arr) as arr_col") ``` The explode causes `st` to be duplicated as many as the exploded elements. Benchmarks it: ``` [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 [info] Intel(R) Core(TM) i7-8750H CPU 2.20GHz [info] generate big nested struct array: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] generate big nested struct array wholestage off 52668 53162 699 0.0 877803.4 1.0X [info] generate big nested struct array wholestage on 47261 49093 1125 0.0 787690.2 1.1X [info] ``` The query plan: ``` == Physical Plan == Project [col#508, st#512.col AS col1#515, arr_col#519] +- Generate explode(st#512.arr), [col#508, st#512], false, [arr_col#519] +- Project [_1#503 AS col#508, named_struct(col, _1#503, arr, _2#504) AS st#512] +- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, true, false) AS _1#503, mapobjects(MapObjects_loopValue84, MapObjects_loopIsNull84, ObjectType(class scala.Tuple4), if (isnull(lambdavariable(MapObjects_loopValue84, MapObjects_loopIsNull84, ObjectType(class scala.Tuple4), true))) null else named_struct(_1, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(lambdavariable(MapObjects_loopValue84, MapObjects_loopIsNull84, ObjectType(class scala.Tuple4), true))._1, true, false), _2, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(lambdavariable(MapObjects_loopValue84, MapObjects_loopIsNull84, ObjectType(class scala.Tuple4), true))._2, true, false), _3, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(lambdavariable(MapObjects_loopValue84, MapObjects_loopIsNull84, ObjectType(class scala.Tuple4), true))._3, true, false), _4, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(lambdavariable(MapObjects_loopValue84, MapObjects_loopIsNull84, ObjectType(class scala.Tuple4), true))._4, true, false)), knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, None) AS _2#504] +- Scan[obj#534] ``` This patch takes nested column pruning approach to prune unnecessary nested fields. It adds a projection of the needed nested fields as aliases on the child of `Generate`, and substitutes them by alias attributes on the projection on top of `Generate`. Benchmarks it after the change: ``` [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 [info] Intel(R) Core(TM) i7-8750H CPU 2.20GHz [info] generate big nested struct array: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] generate big nested struct array wholestage off 311 331 28 0.2 5188.6 1.0X [info] generate big nested struct array wholestage on 297 312 15 0.2 4947.3 1.0X [info] ``` The query plan: ``` == Physical Plan == Project [col#592, _gen_alias_608#608 AS col1#599, arr_col#603] +- Generate explode(st#596.arr), [col#592, _gen_alias_608#608], false, [arr_col#603] +- Project [_1#587 AS col#592, named_struct(col, _1#587, arr, _2#588) AS st#596, _1#587 AS _gen_alias_608#608] +- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(in put[0, scala.Tuple2, true]))._1, true, false) AS _1#587, mapobjects(MapObjects_loopValue102, MapObjects_loopIsNull102, ObjectType(class scala.Tuple4), if (isnull(lambdavariable(MapObjects_loopValue102, MapObjects_loopIsNull102, ObjectType(class scala.Tuple4), true))) null else named_struct(_1, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(lambdavariable(MapObjects_loopValue102, MapObjects_loopIsNull102, ObjectType(class scala.Tuple4), true))._1, true, false), _2, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(lambdavariable(MapObjects_loopValue102, MapObjects_loopIsNull102, ObjectType(class scala.Tuple4), true))._2, true, false), _3, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(lambdavariable(MapObjects_loopValue102, MapObjects_loopIsNull102, ObjectType(class scala.Tuple4), true))._3, true, false), _4, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(lambdavariable(MapObjects_loopValue102, MapObjects_loopIsNull102, ObjectType(class scala.Tuple4), true))._4, true, false)), knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, None) AS _2#588] +- Scan[obj#586] ``` This behavior is controlled by a SQL config `spark.sql.optimizer.expression.nestedPruning.enabled`. ## How was this patch tested? Added benchmark. Closes #24637 from viirya/SPARK-27707. Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> 2019-07-19 02:32:07 -04:00			`OpenJDK 64-Bit Server VM 1.8.0_212-b04 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`generate big nested struct array: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------------------------------`
			`generate big nested struct array wholestage off 540 553 19 0.1 8997.4 1.0X`
			`generate big nested struct array wholestage on 523 554 31 0.1 8725.0 1.0X`

[SPARK-25488][SQL][TEST] Refactor MiscBenchmark to use main method ## What changes were proposed in this pull request? Refactor `MiscBenchmark ` to use main method. Generate benchmark result: ```sh SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.MiscBenchmark" ``` ## How was this patch tested? manual tests Closes #22500 from wangyum/SPARK-25488. Lead-authored-by: Yuming Wang <yumwang@ebay.com> Co-authored-by: Yuming Wang <wgyumg@gmail.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> 2018-10-06 11:47:43 -04:00
			`================================================================================================`
			`generate regular generator`
			`================================================================================================`

			`OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64`
			`Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz`
			`generate stack: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative`
			`------------------------------------------------------------------------------------------------`
			`generate stack wholestage off 29082 / 29393 0.6 1733.4 1.0X`
			`generate stack wholestage on 21066 / 21128 0.8 1255.6 1.4X`