spark-instrumented-optimizer/sql/core/benchmarks/CSVBenchmark-results.txt
HyukjinKwon ebf01ec3c1 [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
### What changes were proposed in this pull request?

https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way.

**NOTE** that looks like GitHub Actions use four types of CPU given my observations:

- Intel(R) Xeon(R) Platinum 8171M CPU  2.60GHz
- Intel(R) Xeon(R) CPU E5-2673 v4  2.30GHz
- Intel(R) Xeon(R) CPU E5-2673 v3  2.40GHz
- Intel(R) Xeon(R) Platinum 8272CL CPU  2.60GHz

Given my quick research, seems like they perform roughly similarly:

![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png)

I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU  2.60GHz but the performance seems roughly similar given the numbers.

So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares.

### Why are the changes needed?

To have a base line of the benchmarks accordingly.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

It was generated from:

- [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465)
- [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337)

Closes #32044 from HyukjinKwon/SPARK-34950.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 23:02:56 +03:00

68 lines
6.1 KiB
Plaintext

================================================================================================
Benchmark to measure CSV read/write performance
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Parsing quoted values: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
One quoted string 43757 44446 765 0.0 875148.4 1.0X
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Select 1000 columns 96330 99161 NaN 0.0 96329.7 1.0X
Select 100 columns 41414 42672 1556 0.0 41414.1 2.3X
Select one column 35365 36113 662 0.0 35365.4 2.7X
count() 18845 18867 26 0.1 18845.0 5.1X
Select 100 columns, one bad input field 68271 68305 51 0.0 68270.7 1.4X
Select 100 columns, corrupt record field 77700 78165 803 0.0 77699.7 1.2X
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Select 10 columns + count() 18462 18651 175 0.5 1846.2 1.0X
Select 1 column + count() 11897 12075 199 0.8 1189.7 1.6X
count() 4218 4229 10 2.4 421.8 4.4X
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Create a dataset of timestamps 1680 1699 17 6.0 168.0 1.0X
to_csv(timestamp) 13269 13787 456 0.8 1326.9 0.1X
write timestamps to files 10747 10785 48 0.9 1074.7 0.2X
Create a dataset of dates 1900 1919 24 5.3 190.0 0.9X
to_csv(date) 9207 9223 23 1.1 920.7 0.2X
write dates to files 6331 6339 7 1.6 633.1 0.3X
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read timestamp text from files 2355 2382 24 4.2 235.5 1.0X
read timestamps from files 31297 31331 35 0.3 3129.7 0.1X
infer timestamps from files 63255 66511 NaN 0.2 6325.5 0.0X
read date text from files 2139 2160 18 4.7 213.9 1.1X
read date from files 17027 17090 89 0.6 1702.7 0.1X
infer date from files 21307 21337 31 0.5 2130.7 0.1X
timestamp strings 3661 3699 35 2.7 366.1 0.6X
parse timestamps from Dataset[String] 36355 37714 1180 0.3 3635.5 0.1X
infer timestamps from Dataset[String] 74494 74851 542 0.1 7449.4 0.0X
date strings 3753 3756 5 2.7 375.3 0.6X
parse dates from Dataset[String] 21590 21714 126 0.5 2159.0 0.1X
from_csv(timestamp) 35419 35459 59 0.3 3541.9 0.1X
from_csv(date) 19081 19124 39 0.5 1908.1 0.1X
OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Filters pushdown: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
w/o filters 22203 22425 192 0.0 222033.1 1.0X
pushdown disabled 22123 22220 89 0.0 221227.6 1.0X
w/ filters 1332 1338 9 0.1 13317.7 16.7X