spark-instrumented-optimizer/sql/core/benchmarks/DataSourceReadBenchmark-jdk11-results.txt
Dongjoon Hyun 854a0f752e [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1)
### What changes were proposed in this pull request?

This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison.

**A. EXPECTED CASES(JDK11 is faster in general)**
- [x] BloomFilterBenchmark (JDK11 is faster except one case)
- [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC)
- [x] CSVBenchmark (JDK11 is faster except five cases)
- [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`)
- [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type)
- [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases)
- [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS)
- [x] HashedRelationMetricsBenchmark (JDK11 is faster)
- [x] JSONBenchmark (JDK11 is much faster except eight cases)
- [x] JoinBenchmark (JDK11 is faster except five cases)
- [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases)
- [x] PrimitiveArrayBenchmark (JDK11 is faster)
- [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case)
- [x] UDFBenchmark (N/A, values are too small)
- [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case)
- [x] WideTableBenchmark (JDK11 is faster except two cases)

**B. CASES WE NEED TO INVESTIGATE MORE LATER**
- [x] AggregateBenchmark (JDK11 is slower in general)
- [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`)
- [x] DataSourceReadBenchmark (JDK11 is slower in general)
- [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`)
- [x] MakeDateTimeBenchmark (JDK11 is slower except two cases)
- [x] MiscBenchmark (JDK11 is slower except ten cases)
- [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower)
- [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases)
- [x] RangeBenchmark (JDK11 is slower except one case)

`FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer.

### Why are the changes needed?

According to the result, there are some difference between JDK8/JDK11.
This will be a baseline for the future improvement and comparison. Also, as a reproducible  environment, the following environment is used.
- Instance: `r3.xlarge`
- OS: `CentOS Linux release 7.5.1804 (Core)`
- JDK:
  - `OpenJDK Runtime Environment (build 1.8.0_222-b10)`
  - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)`

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This is a test-only PR. We need to run benchmark.

Closes #26003 from dongjoon-hyun/SPARK-29320.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 08:58:25 -07:00

253 lines
22 KiB
Plaintext

================================================================================================
SQL Single Numeric Column Scan
================================================================================================
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 27115 27169 76 0.6 1723.9 1.0X
SQL Json 9061 9124 89 1.7 576.1 3.0X
SQL Parquet Vectorized 196 232 39 80.4 12.4 138.5X
SQL Parquet MR 2187 2216 40 7.2 139.1 12.4X
SQL ORC Vectorized 335 344 5 46.9 21.3 80.9X
SQL ORC MR 1757 1786 42 9.0 111.7 15.4X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 201 205 5 78.3 12.8 1.0X
ParquetReader Vectorized -> Row 91 92 1 173.2 5.8 2.2X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 27969 27972 4 0.6 1778.2 1.0X
SQL Json 10328 10389 87 1.5 656.6 2.7X
SQL Parquet Vectorized 217 237 24 72.5 13.8 128.8X
SQL Parquet MR 2494 2567 103 6.3 158.6 11.2X
SQL ORC Vectorized 310 321 10 50.8 19.7 90.3X
SQL ORC MR 1901 1907 9 8.3 120.9 14.7X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 272 280 10 57.8 17.3 1.0X
ParquetReader Vectorized -> Row 144 185 68 109.3 9.1 1.9X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 29507 29532 34 0.5 1876.0 1.0X
SQL Json 10463 10474 16 1.5 665.2 2.8X
SQL Parquet Vectorized 193 204 10 81.3 12.3 152.6X
SQL Parquet MR 2948 2954 7 5.3 187.5 10.0X
SQL ORC Vectorized 268 277 9 58.7 17.0 110.1X
SQL ORC MR 1910 1950 57 8.2 121.4 15.5X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 263 278 38 59.7 16.7 1.0X
ParquetReader Vectorized -> Row 259 266 9 60.7 16.5 1.0X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 36696 36771 106 0.4 2333.0 1.0X
SQL Json 13496 13520 34 1.2 858.0 2.7X
SQL Parquet Vectorized 282 292 9 55.7 17.9 130.0X
SQL Parquet MR 3358 3383 36 4.7 213.5 10.9X
SQL ORC Vectorized 409 414 5 38.5 26.0 89.7X
SQL ORC MR 2250 2275 35 7.0 143.1 16.3X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 360 372 15 43.6 22.9 1.0X
ParquetReader Vectorized -> Row 354 357 5 44.4 22.5 1.0X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 30462 30466 5 0.5 1936.7 1.0X
SQL Json 12916 12948 45 1.2 821.2 2.4X
SQL Parquet Vectorized 181 185 5 86.7 11.5 168.0X
SQL Parquet MR 2810 2820 14 5.6 178.7 10.8X
SQL ORC Vectorized 426 430 4 36.9 27.1 71.6X
SQL ORC MR 2106 2112 9 7.5 133.9 14.5X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 255 261 7 61.6 16.2 1.0X
ParquetReader Vectorized -> Row 285 288 5 55.1 18.1 0.9X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 36950 36979 41 0.4 2349.2 1.0X
SQL Json 18794 18795 2 0.8 1194.9 2.0X
SQL Parquet Vectorized 279 295 17 56.3 17.8 132.3X
SQL Parquet MR 3933 4025 130 4.0 250.0 9.4X
SQL ORC Vectorized 521 527 6 30.2 33.2 70.9X
SQL ORC MR 2290 2326 51 6.9 145.6 16.1X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 356 365 12 44.2 22.6 1.0X
ParquetReader Vectorized -> Row 350 352 2 45.0 22.2 1.0X
================================================================================================
Int and String Scan
================================================================================================
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Int and String Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 26764 26810 65 0.4 2552.4 1.0X
SQL Json 12107 12195 124 0.9 1154.6 2.2X
SQL Parquet Vectorized 2202 2210 10 4.8 210.0 12.2X
SQL Parquet MR 5297 5302 6 2.0 505.2 5.1X
SQL ORC Vectorized 2356 2372 23 4.5 224.7 11.4X
SQL ORC MR 4370 4419 70 2.4 416.8 6.1X
================================================================================================
Repeated String Scan
================================================================================================
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Repeated String: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 19953 19966 18 0.5 1902.8 1.0X
SQL Json 7151 7220 98 1.5 681.9 2.8X
SQL Parquet Vectorized 692 695 3 15.1 66.0 28.8X
SQL Parquet MR 2859 2943 118 3.7 272.6 7.0X
SQL ORC Vectorized 535 540 5 19.6 51.0 37.3X
SQL ORC MR 2157 2162 8 4.9 205.7 9.3X
================================================================================================
Partitioned Table Scan
================================================================================================
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Partitioned Table: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Data column - CSV 46775 46785 13 0.3 2973.9 1.0X
Data column - Json 13891 13893 2 1.1 883.2 3.4X
Data column - Parquet Vectorized 301 306 7 52.3 19.1 155.6X
Data column - Parquet MR 3565 3572 10 4.4 226.7 13.1X
Data column - ORC Vectorized 434 458 36 36.2 27.6 107.7X
Data column - ORC MR 2337 2354 24 6.7 148.6 20.0X
Partition column - CSV 10645 10688 61 1.5 676.8 4.4X
Partition column - Json 10912 10973 87 1.4 693.7 4.3X
Partition column - Parquet Vectorized 93 103 9 169.4 5.9 503.8X
Partition column - Parquet MR 1588 1597 13 9.9 100.9 29.5X
Partition column - ORC Vectorized 92 99 11 170.7 5.9 507.6X
Partition column - ORC MR 1714 1716 3 9.2 109.0 27.3X
Both columns - CSV 46199 46222 32 0.3 2937.3 1.0X
Both columns - Json 17279 17291 18 0.9 1098.6 2.7X
Both columns - Parquet Vectorized 346 355 13 45.4 22.0 135.0X
Both columns - Parquet MR 3883 3908 35 4.1 246.9 12.0X
Both columns - ORC Vectorized 577 618 57 27.3 36.7 81.1X
Both columns - ORC MR 2967 3024 80 5.3 188.7 15.8X
================================================================================================
String with Nulls Scan
================================================================================================
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
String with Nulls Scan (0.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 23623 23731 153 0.4 2252.9 1.0X
SQL Json 13299 13432 187 0.8 1268.3 1.8X
SQL Parquet Vectorized 1464 1466 4 7.2 139.6 16.1X
SQL Parquet MR 7602 7628 37 1.4 724.9 3.1X
ParquetReader Vectorized 1032 1043 15 10.2 98.4 22.9X
SQL ORC Vectorized 1206 1211 7 8.7 115.0 19.6X
SQL ORC MR 4726 4991 374 2.2 450.7 5.0X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
String with Nulls Scan (50.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 23715 24152 619 0.4 2261.6 1.0X
SQL Json 10120 10280 226 1.0 965.1 2.3X
SQL Parquet Vectorized 1063 1072 13 9.9 101.4 22.3X
SQL Parquet MR 5460 5464 5 1.9 520.8 4.3X
ParquetReader Vectorized 934 936 4 11.2 89.0 25.4X
SQL ORC Vectorized 1094 1094 0 9.6 104.3 21.7X
SQL ORC MR 3964 4401 618 2.6 378.0 6.0X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
String with Nulls Scan (95.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 21348 21472 175 0.5 2035.9 1.0X
SQL Json 5877 5956 112 1.8 560.5 3.6X
SQL Parquet Vectorized 244 256 22 43.0 23.2 87.6X
SQL Parquet MR 3139 3371 328 3.3 299.4 6.8X
ParquetReader Vectorized 238 245 9 44.1 22.7 89.7X
SQL ORC Vectorized 378 383 7 27.7 36.0 56.5X
SQL ORC MR 2234 2315 115 4.7 213.0 9.6X
================================================================================================
Single Column Scan From Wide Columns
================================================================================================
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Single Column Scan from 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 4053 4064 16 0.3 3865.4 1.0X
SQL Json 4115 4118 4 0.3 3924.6 1.0X
SQL Parquet Vectorized 72 82 11 14.5 69.0 56.0X
SQL Parquet MR 314 325 18 3.3 299.3 12.9X
SQL ORC Vectorized 80 86 8 13.1 76.2 50.7X
SQL ORC MR 250 253 2 4.2 238.5 16.2X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Single Column Scan from 50 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 7802 7849 66 0.1 7440.8 1.0X
SQL Json 16640 17481 1190 0.1 15868.8 0.5X
SQL Parquet Vectorized 106 126 31 9.9 101.0 73.7X
SQL Parquet MR 349 358 7 3.0 332.8 22.4X
SQL ORC Vectorized 108 115 10 9.7 102.7 72.5X
SQL ORC MR 284 298 20 3.7 270.5 27.5X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Single Column Scan from 100 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 12639 12672 47 0.1 12053.5 1.0X
SQL Json 30613 30688 106 0.0 29194.8 0.4X
SQL Parquet Vectorized 145 165 21 7.2 138.3 87.2X
SQL Parquet MR 384 393 9 2.7 366.4 32.9X
SQL ORC Vectorized 129 134 5 8.1 123.2 97.8X
SQL ORC MR 280 319 66 3.7 266.9 45.2X