ebf01ec3c1
### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
253 lines
22 KiB
Plaintext
253 lines
22 KiB
Plaintext
================================================================================================
|
|
SQL Single Numeric Column Scan
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 13405 13422 24 1.2 852.3 1.0X
|
|
SQL Json 10723 10788 92 1.5 681.7 1.3X
|
|
SQL Parquet Vectorized 164 217 50 95.9 10.4 81.8X
|
|
SQL Parquet MR 2349 2440 129 6.7 149.3 5.7X
|
|
SQL ORC Vectorized 312 346 23 50.4 19.8 43.0X
|
|
SQL ORC MR 1610 1659 69 9.8 102.4 8.3X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
-------------------------------------------------------------------------------------------------------------------------
|
|
ParquetReader Vectorized 187 209 20 84.3 11.9 1.0X
|
|
ParquetReader Vectorized -> Row 89 95 5 177.6 5.6 2.1X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 14214 14549 474 1.1 903.7 1.0X
|
|
SQL Json 11866 11934 95 1.3 754.4 1.2X
|
|
SQL Parquet Vectorized 294 342 53 53.6 18.7 48.4X
|
|
SQL Parquet MR 2929 3004 107 5.4 186.2 4.9X
|
|
SQL ORC Vectorized 312 328 15 50.4 19.8 45.5X
|
|
SQL ORC MR 2037 2097 84 7.7 129.5 7.0X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Parquet Reader Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
--------------------------------------------------------------------------------------------------------------------------
|
|
ParquetReader Vectorized 249 266 18 63.1 15.8 1.0X
|
|
ParquetReader Vectorized -> Row 192 247 36 82.1 12.2 1.3X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
SQL Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 15502 15817 446 1.0 985.6 1.0X
|
|
SQL Json 12638 12646 11 1.2 803.5 1.2X
|
|
SQL Parquet Vectorized 193 256 44 81.7 12.2 80.5X
|
|
SQL Parquet MR 2943 2953 14 5.3 187.1 5.3X
|
|
SQL ORC Vectorized 324 370 34 48.5 20.6 47.8X
|
|
SQL ORC MR 2110 2163 75 7.5 134.1 7.3X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Parquet Reader Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
ParquetReader Vectorized 276 287 14 57.0 17.6 1.0X
|
|
ParquetReader Vectorized -> Row 309 320 9 50.9 19.6 0.9X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
SQL Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 20156 20694 761 0.8 1281.5 1.0X
|
|
SQL Json 15228 15380 214 1.0 968.2 1.3X
|
|
SQL Parquet Vectorized 325 346 20 48.4 20.7 62.0X
|
|
SQL Parquet MR 3144 3228 118 5.0 199.9 6.4X
|
|
SQL ORC Vectorized 516 526 7 30.5 32.8 39.0X
|
|
SQL ORC MR 2353 2367 19 6.7 149.6 8.6X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Parquet Reader Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
ParquetReader Vectorized 372 396 24 42.3 23.6 1.0X
|
|
ParquetReader Vectorized -> Row 437 462 25 36.0 27.8 0.9X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
SQL Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 17413 17599 263 0.9 1107.1 1.0X
|
|
SQL Json 14416 14453 53 1.1 916.5 1.2X
|
|
SQL Parquet Vectorized 181 225 35 86.8 11.5 96.1X
|
|
SQL Parquet MR 2940 2996 78 5.3 186.9 5.9X
|
|
SQL ORC Vectorized 470 494 29 33.5 29.9 37.1X
|
|
SQL ORC MR 2351 2379 39 6.7 149.5 7.4X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Parquet Reader Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
ParquetReader Vectorized 268 282 14 58.7 17.0 1.0X
|
|
ParquetReader Vectorized -> Row 298 321 18 52.8 18.9 0.9X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
SQL Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 21666 21697 43 0.7 1377.5 1.0X
|
|
SQL Json 18307 18363 79 0.9 1163.9 1.2X
|
|
SQL Parquet Vectorized 310 337 22 50.7 19.7 69.9X
|
|
SQL Parquet MR 3089 3103 19 5.1 196.4 7.0X
|
|
SQL ORC Vectorized 589 617 31 26.7 37.5 36.8X
|
|
SQL ORC MR 2307 2377 98 6.8 146.7 9.4X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Parquet Reader Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
ParquetReader Vectorized 400 415 18 39.3 25.4 1.0X
|
|
ParquetReader Vectorized -> Row 393 406 11 40.1 25.0 1.0X
|
|
|
|
|
|
================================================================================================
|
|
Int and String Scan
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Int and String Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 17703 17719 22 0.6 1688.3 1.0X
|
|
SQL Json 13095 13168 103 0.8 1248.9 1.4X
|
|
SQL Parquet Vectorized 2253 2266 19 4.7 214.8 7.9X
|
|
SQL Parquet MR 4913 4977 91 2.1 468.5 3.6X
|
|
SQL ORC Vectorized 2457 2467 14 4.3 234.3 7.2X
|
|
SQL ORC MR 4433 4464 44 2.4 422.8 4.0X
|
|
|
|
|
|
================================================================================================
|
|
Repeated String Scan
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Repeated String: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 9741 9804 89 1.1 929.0 1.0X
|
|
SQL Json 8230 8401 241 1.3 784.9 1.2X
|
|
SQL Parquet Vectorized 618 650 31 17.0 58.9 15.8X
|
|
SQL Parquet MR 2258 2311 75 4.6 215.4 4.3X
|
|
SQL ORC Vectorized 608 629 15 17.3 58.0 16.0X
|
|
SQL ORC MR 2466 2479 18 4.3 235.2 4.0X
|
|
|
|
|
|
================================================================================================
|
|
Partitioned Table Scan
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Partitioned Table: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Data column - CSV 24195 24573 534 0.7 1538.3 1.0X
|
|
Data column - Json 14746 14883 194 1.1 937.5 1.6X
|
|
Data column - Parquet Vectorized 352 385 34 44.7 22.4 68.7X
|
|
Data column - Parquet MR 3674 3694 27 4.3 233.6 6.6X
|
|
Data column - ORC Vectorized 480 505 26 32.8 30.5 50.4X
|
|
Data column - ORC MR 2913 3004 128 5.4 185.2 8.3X
|
|
Partition column - CSV 7527 7544 23 2.1 478.6 3.2X
|
|
Partition column - Json 11955 12051 135 1.3 760.1 2.0X
|
|
Partition column - Parquet Vectorized 65 92 29 242.5 4.1 373.0X
|
|
Partition column - Parquet MR 1614 1628 21 9.7 102.6 15.0X
|
|
Partition column - ORC Vectorized 71 99 29 220.1 4.5 338.5X
|
|
Partition column - ORC MR 1761 1769 11 8.9 112.0 13.7X
|
|
Both columns - CSV 24077 24127 70 0.7 1530.8 1.0X
|
|
Both columns - Json 15286 15479 273 1.0 971.9 1.6X
|
|
Both columns - Parquet Vectorized 376 412 40 41.9 23.9 64.4X
|
|
Both columns - Parquet MR 3808 3826 26 4.1 242.1 6.4X
|
|
Both columns - ORC Vectorized 560 604 42 28.1 35.6 43.2X
|
|
Both columns - ORC MR 3046 3080 49 5.2 193.7 7.9X
|
|
|
|
|
|
================================================================================================
|
|
String with Nulls Scan
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
String with Nulls Scan (0.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 11805 12021 306 0.9 1125.8 1.0X
|
|
SQL Json 12051 12105 77 0.9 1149.3 1.0X
|
|
SQL Parquet Vectorized 1474 1545 100 7.1 140.6 8.0X
|
|
SQL Parquet MR 4488 4492 4 2.3 428.1 2.6X
|
|
ParquetReader Vectorized 1140 1140 1 9.2 108.7 10.4X
|
|
SQL ORC Vectorized 1164 1178 20 9.0 111.0 10.1X
|
|
SQL ORC MR 3745 3817 102 2.8 357.1 3.2X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
String with Nulls Scan (50.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 9814 9837 33 1.1 936.0 1.0X
|
|
SQL Json 9317 9445 182 1.1 888.5 1.1X
|
|
SQL Parquet Vectorized 1117 1155 52 9.4 106.6 8.8X
|
|
SQL Parquet MR 3463 3538 106 3.0 330.3 2.8X
|
|
ParquetReader Vectorized 1033 1039 8 10.1 98.6 9.5X
|
|
SQL ORC Vectorized 1307 1353 65 8.0 124.7 7.5X
|
|
SQL ORC MR 3644 3690 65 2.9 347.5 2.7X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
String with Nulls Scan (95.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 8145 8270 176 1.3 776.8 1.0X
|
|
SQL Json 5714 5764 71 1.8 544.9 1.4X
|
|
SQL Parquet Vectorized 235 264 15 44.6 22.4 34.7X
|
|
SQL Parquet MR 2398 2412 19 4.4 228.7 3.4X
|
|
ParquetReader Vectorized 248 262 11 42.3 23.6 32.9X
|
|
SQL ORC Vectorized 430 462 37 24.4 41.0 18.9X
|
|
SQL ORC MR 1983 1993 14 5.3 189.1 4.1X
|
|
|
|
|
|
================================================================================================
|
|
Single Column Scan From Wide Columns
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Single Column Scan from 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 2448 2461 18 0.4 2334.3 1.0X
|
|
SQL Json 3332 3370 53 0.3 3177.6 0.7X
|
|
SQL Parquet Vectorized 51 87 25 20.7 48.2 48.4X
|
|
SQL Parquet MR 239 278 35 4.4 227.5 10.3X
|
|
SQL ORC Vectorized 60 82 19 17.5 57.3 40.8X
|
|
SQL ORC MR 197 219 26 5.3 188.3 12.4X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Single Column Scan from 50 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 6034 6061 39 0.2 5754.0 1.0X
|
|
SQL Json 12232 12315 118 0.1 11665.4 0.5X
|
|
SQL Parquet Vectorized 73 120 30 14.4 69.6 82.6X
|
|
SQL Parquet MR 316 368 44 3.3 301.1 19.1X
|
|
SQL ORC Vectorized 76 122 36 13.7 72.9 79.0X
|
|
SQL ORC MR 206 261 47 5.1 196.5 29.3X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
Single Column Scan from 100 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
SQL CSV 10307 10309 4 0.1 9829.0 1.0X
|
|
SQL Json 23412 23539 180 0.0 22327.7 0.4X
|
|
SQL Parquet Vectorized 105 151 23 10.0 99.9 98.4X
|
|
SQL Parquet MR 295 325 29 3.6 281.5 34.9X
|
|
SQL ORC Vectorized 85 112 31 12.4 81.0 121.4X
|
|
SQL ORC MR 212 255 66 4.9 202.3 48.6X
|
|
|
|
|