ebf01ec3c1
### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
68 lines
6.1 KiB
Plaintext
68 lines
6.1 KiB
Plaintext
================================================================================================
|
|
Benchmark to measure CSV read/write performance
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Parsing quoted values: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
One quoted string 43757 44446 765 0.0 875148.4 1.0X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 1000 columns 96330 99161 NaN 0.0 96329.7 1.0X
|
|
Select 100 columns 41414 42672 1556 0.0 41414.1 2.3X
|
|
Select one column 35365 36113 662 0.0 35365.4 2.7X
|
|
count() 18845 18867 26 0.1 18845.0 5.1X
|
|
Select 100 columns, one bad input field 68271 68305 51 0.0 68270.7 1.4X
|
|
Select 100 columns, corrupt record field 77700 78165 803 0.0 77699.7 1.2X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 10 columns + count() 18462 18651 175 0.5 1846.2 1.0X
|
|
Select 1 column + count() 11897 12075 199 0.8 1189.7 1.6X
|
|
count() 4218 4229 10 2.4 421.8 4.4X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Create a dataset of timestamps 1680 1699 17 6.0 168.0 1.0X
|
|
to_csv(timestamp) 13269 13787 456 0.8 1326.9 0.1X
|
|
write timestamps to files 10747 10785 48 0.9 1074.7 0.2X
|
|
Create a dataset of dates 1900 1919 24 5.3 190.0 0.9X
|
|
to_csv(date) 9207 9223 23 1.1 920.7 0.2X
|
|
write dates to files 6331 6339 7 1.6 633.1 0.3X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
read timestamp text from files 2355 2382 24 4.2 235.5 1.0X
|
|
read timestamps from files 31297 31331 35 0.3 3129.7 0.1X
|
|
infer timestamps from files 63255 66511 NaN 0.2 6325.5 0.0X
|
|
read date text from files 2139 2160 18 4.7 213.9 1.1X
|
|
read date from files 17027 17090 89 0.6 1702.7 0.1X
|
|
infer date from files 21307 21337 31 0.5 2130.7 0.1X
|
|
timestamp strings 3661 3699 35 2.7 366.1 0.6X
|
|
parse timestamps from Dataset[String] 36355 37714 1180 0.3 3635.5 0.1X
|
|
infer timestamps from Dataset[String] 74494 74851 542 0.1 7449.4 0.0X
|
|
date strings 3753 3756 5 2.7 375.3 0.6X
|
|
parse dates from Dataset[String] 21590 21714 126 0.5 2159.0 0.1X
|
|
from_csv(timestamp) 35419 35459 59 0.3 3541.9 0.1X
|
|
from_csv(date) 19081 19124 39 0.5 1908.1 0.1X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Filters pushdown: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
w/o filters 22203 22425 192 0.0 222033.1 1.0X
|
|
pushdown disabled 22123 22220 89 0.0 221227.6 1.0X
|
|
w/ filters 1332 1338 9 0.1 13317.7 16.7X
|
|
|
|
|