spark-instrumented-optimizer/sql/core/benchmarks/DataSourceReadBenchmark-results.txt
Dongjoon Hyun 854a0f752e [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1)
### What changes were proposed in this pull request?

This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison.

**A. EXPECTED CASES(JDK11 is faster in general)**
- [x] BloomFilterBenchmark (JDK11 is faster except one case)
- [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC)
- [x] CSVBenchmark (JDK11 is faster except five cases)
- [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`)
- [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type)
- [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases)
- [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS)
- [x] HashedRelationMetricsBenchmark (JDK11 is faster)
- [x] JSONBenchmark (JDK11 is much faster except eight cases)
- [x] JoinBenchmark (JDK11 is faster except five cases)
- [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases)
- [x] PrimitiveArrayBenchmark (JDK11 is faster)
- [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case)
- [x] UDFBenchmark (N/A, values are too small)
- [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case)
- [x] WideTableBenchmark (JDK11 is faster except two cases)

**B. CASES WE NEED TO INVESTIGATE MORE LATER**
- [x] AggregateBenchmark (JDK11 is slower in general)
- [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`)
- [x] DataSourceReadBenchmark (JDK11 is slower in general)
- [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`)
- [x] MakeDateTimeBenchmark (JDK11 is slower except two cases)
- [x] MiscBenchmark (JDK11 is slower except ten cases)
- [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower)
- [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases)
- [x] RangeBenchmark (JDK11 is slower except one case)

`FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer.

### Why are the changes needed?

According to the result, there are some difference between JDK8/JDK11.
This will be a baseline for the future improvement and comparison. Also, as a reproducible  environment, the following environment is used.
- Instance: `r3.xlarge`
- OS: `CentOS Linux release 7.5.1804 (Core)`
- JDK:
  - `OpenJDK Runtime Environment (build 1.8.0_222-b10)`
  - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)`

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This is a test-only PR. We need to run benchmark.

Closes #26003 from dongjoon-hyun/SPARK-29320.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 08:58:25 -07:00

253 lines
22 KiB
Plaintext

================================================================================================
SQL Single Numeric Column Scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 23037 23172 191 0.7 1464.7 1.0X
SQL Json 8682 8686 5 1.8 552.0 2.7X
SQL Parquet Vectorized 183 205 32 85.9 11.6 125.8X
SQL Parquet MR 2189 2200 15 7.2 139.2 10.5X
SQL ORC Vectorized 296 306 5 53.1 18.8 77.7X
SQL ORC MR 1705 1717 18 9.2 108.4 13.5X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 195 200 7 80.9 12.4 1.0X
ParquetReader Vectorized -> Row 96 97 2 163.0 6.1 2.0X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 25126 25265 196 0.6 1597.5 1.0X
SQL Json 9442 9445 4 1.7 600.3 2.7X
SQL Parquet Vectorized 228 240 7 69.1 14.5 110.4X
SQL Parquet MR 2432 2445 19 6.5 154.6 10.3X
SQL ORC Vectorized 315 319 6 49.9 20.0 79.8X
SQL ORC MR 1901 1916 21 8.3 120.9 13.2X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 293 302 9 53.6 18.7 1.0X
ParquetReader Vectorized -> Row 264 266 2 59.7 16.8 1.1X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 27419 27443 34 0.6 1743.3 1.0X
SQL Json 9831 9836 8 1.6 625.0 2.8X
SQL Parquet Vectorized 192 198 9 81.8 12.2 142.7X
SQL Parquet MR 2696 2740 62 5.8 171.4 10.2X
SQL ORC Vectorized 329 335 8 47.9 20.9 83.4X
SQL ORC MR 1932 2006 105 8.1 122.8 14.2X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 248 253 6 63.5 15.8 1.0X
ParquetReader Vectorized -> Row 250 256 7 62.9 15.9 1.0X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 34898 34907 14 0.5 2218.7 1.0X
SQL Json 12760 12764 5 1.2 811.3 2.7X
SQL Parquet Vectorized 283 289 5 55.6 18.0 123.3X
SQL Parquet MR 3238 3240 3 4.9 205.9 10.8X
SQL ORC Vectorized 401 405 7 39.2 25.5 87.0X
SQL ORC MR 2274 2290 23 6.9 144.6 15.3X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 339 351 16 46.5 21.5 1.0X
ParquetReader Vectorized -> Row 342 348 13 46.0 21.8 1.0X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 28872 28886 20 0.5 1835.6 1.0X
SQL Json 13360 13377 24 1.2 849.4 2.2X
SQL Parquet Vectorized 181 185 6 86.8 11.5 159.3X
SQL Parquet MR 2645 2651 8 5.9 168.2 10.9X
SQL ORC Vectorized 456 459 5 34.5 29.0 63.4X
SQL ORC MR 2047 2066 26 7.7 130.2 14.1X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 240 246 10 65.5 15.3 1.0X
ParquetReader Vectorized -> Row 245 246 2 64.2 15.6 1.0X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
SQL Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 36298 36305 10 0.4 2307.7 1.0X
SQL Json 18250 18276 36 0.9 1160.3 2.0X
SQL Parquet Vectorized 278 285 7 56.5 17.7 130.4X
SQL Parquet MR 3144 3146 4 5.0 199.9 11.5X
SQL ORC Vectorized 533 546 16 29.5 33.9 68.1X
SQL ORC MR 2265 2302 53 6.9 144.0 16.0X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Parquet Reader Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ParquetReader Vectorized 338 346 12 46.6 21.5 1.0X
ParquetReader Vectorized -> Row 338 344 9 46.5 21.5 1.0X
================================================================================================
Int and String Scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Int and String Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 24839 25273 613 0.4 2368.9 1.0X
SQL Json 11861 11869 11 0.9 1131.2 2.1X
SQL Parquet Vectorized 2298 2305 9 4.6 219.2 10.8X
SQL Parquet MR 5045 5053 10 2.1 481.2 4.9X
SQL ORC Vectorized 2391 2405 21 4.4 228.0 10.4X
SQL ORC MR 4561 4645 118 2.3 435.0 5.4X
================================================================================================
Repeated String Scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Repeated String: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 14147 14244 137 0.7 1349.1 1.0X
SQL Json 7289 7306 23 1.4 695.1 1.9X
SQL Parquet Vectorized 818 821 4 12.8 78.0 17.3X
SQL Parquet MR 2562 2570 11 4.1 244.4 5.5X
SQL ORC Vectorized 571 579 8 18.3 54.5 24.8X
SQL ORC MR 2143 2164 31 4.9 204.3 6.6X
================================================================================================
Partitioned Table Scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Partitioned Table: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Data column - CSV 38652 38680 40 0.4 2457.4 1.0X
Data column - Json 12756 12760 5 1.2 811.0 3.0X
Data column - Parquet Vectorized 304 314 9 51.7 19.3 127.2X
Data column - Parquet MR 3387 3393 9 4.6 215.3 11.4X
Data column - ORC Vectorized 425 436 10 37.0 27.0 91.0X
Data column - ORC MR 2303 2330 38 6.8 146.4 16.8X
Partition column - CSV 11239 11249 14 1.4 714.5 3.4X
Partition column - Json 10477 10479 3 1.5 666.1 3.7X
Partition column - Parquet Vectorized 95 102 9 165.5 6.0 406.7X
Partition column - Parquet MR 1574 1575 1 10.0 100.1 24.6X
Partition column - ORC Vectorized 95 106 20 166.3 6.0 408.5X
Partition column - ORC MR 1682 1693 15 9.4 106.9 23.0X
Both columns - CSV 39146 39203 81 0.4 2488.8 1.0X
Both columns - Json 14675 14691 23 1.1 933.0 2.6X
Both columns - Parquet Vectorized 347 351 3 45.3 22.1 111.4X
Both columns - Parquet MR 3680 3717 52 4.3 234.0 10.5X
Both columns - ORC Vectorized 556 565 8 28.3 35.3 69.6X
Both columns - ORC MR 2909 2923 20 5.4 184.9 13.3X
================================================================================================
String with Nulls Scan
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
String with Nulls Scan (0.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 17457 17740 401 0.6 1664.9 1.0X
SQL Json 12276 12287 16 0.9 1170.7 1.4X
SQL Parquet Vectorized 1525 1539 20 6.9 145.4 11.5X
SQL Parquet MR 5051 5098 66 2.1 481.7 3.5X
ParquetReader Vectorized 1115 1123 12 9.4 106.3 15.7X
SQL ORC Vectorized 1269 1294 37 8.3 121.0 13.8X
SQL ORC MR 3938 3951 17 2.7 375.6 4.4X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
String with Nulls Scan (50.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 18086 18119 47 0.6 1724.8 1.0X
SQL Json 8484 8851 520 1.2 809.1 2.1X
SQL Parquet Vectorized 1127 1131 5 9.3 107.5 16.0X
SQL Parquet MR 4120 4131 15 2.5 392.9 4.4X
ParquetReader Vectorized 984 1019 49 10.7 93.9 18.4X
SQL ORC Vectorized 1208 1211 4 8.7 115.2 15.0X
SQL ORC MR 3401 3410 13 3.1 324.4 5.3X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
String with Nulls Scan (95.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 24825 24970 205 0.4 2367.5 1.0X
SQL Json 9847 9857 14 1.1 939.1 2.5X
SQL Parquet Vectorized 258 261 6 40.7 24.6 96.3X
SQL Parquet MR 3182 3242 85 3.3 303.4 7.8X
ParquetReader Vectorized 241 242 2 43.6 22.9 103.2X
SQL ORC Vectorized 453 456 4 23.1 43.2 54.8X
SQL ORC MR 1917 1927 13 5.5 182.8 12.9X
================================================================================================
Single Column Scan From Wide Columns
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Single Column Scan from 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 5163 5174 16 0.2 4923.5 1.0X
SQL Json 4459 4538 111 0.2 4252.7 1.2X
SQL Parquet Vectorized 78 84 8 13.4 74.7 65.9X
SQL Parquet MR 511 519 9 2.1 486.9 10.1X
SQL ORC Vectorized 86 93 11 12.2 82.1 60.0X
SQL ORC MR 350 359 7 3.0 333.6 14.8X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Single Column Scan from 50 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 9839 9842 4 0.1 9383.4 1.0X
SQL Json 15887 15889 4 0.1 15150.7 0.6X
SQL Parquet Vectorized 115 125 11 9.1 109.9 85.4X
SQL Parquet MR 666 671 8 1.6 635.4 14.8X
SQL ORC Vectorized 115 120 6 9.1 110.1 85.2X
SQL ORC MR 455 458 3 2.3 433.7 21.6X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Single Column Scan from 100 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
SQL CSV 15858 15891 46 0.1 15123.5 1.0X
SQL Json 30200 30256 80 0.0 28800.6 0.5X
SQL Parquet Vectorized 160 165 7 6.5 153.0 98.8X
SQL Parquet MR 682 690 7 1.5 650.3 23.3X
SQL ORC Vectorized 143 150 10 7.4 136.0 111.2X
SQL ORC MR 494 509 15 2.1 471.4 32.1X