ebf01ec3c1
### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
121 lines
10 KiB
Plaintext
121 lines
10 KiB
Plaintext
================================================================================================
|
|
Benchmark for performance of JSON parsing
|
|
================================================================================================
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 4180 4300 122 1.2 836.1 1.0X
|
|
UTF-8 is set 5506 5566 70 0.9 1101.3 0.8X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 2878 2926 58 1.7 575.6 1.0X
|
|
UTF-8 is set 4189 4239 43 1.2 837.8 0.7X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 6729 6876 128 0.1 6728.7 1.0X
|
|
UTF-8 is set 10313 10402 126 0.1 10312.6 0.7X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 15375 15551 201 0.0 307498.9 1.0X
|
|
UTF-8 is set 18257 18476 190 0.0 365135.8 0.8X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 10 columns 2664 2673 11 0.4 2664.2 1.0X
|
|
Select 1 column 2335 2353 16 0.4 2335.3 1.1X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Short column without encoding 845 852 7 1.2 845.0 1.0X
|
|
Short column with UTF-8 1149 1161 12 0.9 1148.8 0.7X
|
|
Wide column without encoding 9971 9991 29 0.1 9971.1 0.1X
|
|
Wide column with UTF-8 14047 14059 14 0.1 14047.3 0.1X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 90 91 1 11.1 90.4 1.0X
|
|
from_json 2265 2291 25 0.4 2265.3 0.0X
|
|
json_tuple 2585 2607 36 0.4 2584.7 0.0X
|
|
get_json_object 2381 2388 10 0.4 2381.0 0.0X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 397 399 2 12.6 79.4 1.0X
|
|
schema inferring 3722 3770 43 1.3 744.4 0.1X
|
|
parsing 3265 3282 21 1.5 653.0 0.1X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 1030 1037 9 4.9 206.0 1.0X
|
|
Schema inferring 4515 4560 78 1.1 902.9 0.2X
|
|
Parsing without charset 3714 3772 64 1.3 742.7 0.3X
|
|
Parsing with UTF-8 5370 5476 97 0.9 1074.1 0.2X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Create a dataset of timestamps 174 178 5 5.7 174.4 1.0X
|
|
to_json(timestamp) 1354 1368 12 0.7 1353.8 0.1X
|
|
write timestamps to files 1215 1226 16 0.8 1214.5 0.1X
|
|
Create a dataset of dates 184 188 5 5.4 184.0 0.9X
|
|
to_json(date) 898 922 24 1.1 898.5 0.2X
|
|
write dates to files 708 716 10 1.4 708.1 0.2X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
read timestamp text from files 265 285 23 3.8 265.0 1.0X
|
|
read timestamps from files 3107 3132 23 0.3 3107.1 0.1X
|
|
infer timestamps from files 6316 6365 43 0.2 6315.5 0.0X
|
|
read date text from files 241 259 19 4.2 240.6 1.1X
|
|
read date from files 1259 1278 20 0.8 1259.4 0.2X
|
|
timestamp strings 290 293 4 3.4 290.3 0.9X
|
|
parse timestamps from Dataset[String] 3324 3359 34 0.3 3324.4 0.1X
|
|
infer timestamps from Dataset[String] 6868 6979 113 0.1 6867.7 0.0X
|
|
date strings 380 384 7 2.6 379.6 0.7X
|
|
parse dates from Dataset[String] 1650 1672 20 0.6 1649.8 0.2X
|
|
from_json(timestamp) 4944 4969 33 0.2 4943.7 0.1X
|
|
from_json(date) 3188 3251 57 0.3 3188.0 0.1X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
|
|
Filters pushdown: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
w/o filters 24601 24817 219 0.0 246012.5 1.0X
|
|
pushdown disabled 24029 24183 137 0.0 240289.2 1.0X
|
|
w/ filters 782 794 12 0.1 7822.7 31.4X
|
|
|
|
|