ebf01ec3c1
### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
47 lines
3.8 KiB
Plaintext
47 lines
3.8 KiB
Plaintext
================================================================================================
|
|
Dataset Benchmark
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
back-to-back map long: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
RDD 13660 13836 249 7.3 136.6 1.0X
|
|
DataFrame 2103 2125 30 47.5 21.0 6.5X
|
|
Dataset 2899 2910 16 34.5 29.0 4.7X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
back-to-back map: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
RDD 14939 14940 2 6.7 149.4 1.0X
|
|
DataFrame 5377 5529 216 18.6 53.8 2.8X
|
|
Dataset 15861 15923 88 6.3 158.6 0.9X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
back-to-back filter Long: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
RDD 3803 3842 56 26.3 38.0 1.0X
|
|
DataFrame 1359 1369 14 73.6 13.6 2.8X
|
|
Dataset 3667 3668 1 27.3 36.7 1.0X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
back-to-back filter: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
RDD 4572 4595 33 21.9 45.7 1.0X
|
|
DataFrame 212 261 45 471.6 2.1 21.6X
|
|
Dataset 5629 5776 208 17.8 56.3 0.8X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
aggregate: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
RDD sum 3528 3563 50 28.3 35.3 1.0X
|
|
DataFrame sum 81 111 23 1240.3 0.8 43.8X
|
|
Dataset sum using Aggregator 5140 5164 34 19.5 51.4 0.7X
|
|
Dataset complex Aggregator 9815 9921 150 10.2 98.1 0.4X
|
|
|
|
|