spark-instrumented-optimizer/sql/core/benchmarks/DatasetBenchmark-results.txt
Dongjoon Hyun 854a0f752e [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1)
### What changes were proposed in this pull request?

This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison.

**A. EXPECTED CASES(JDK11 is faster in general)**
- [x] BloomFilterBenchmark (JDK11 is faster except one case)
- [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC)
- [x] CSVBenchmark (JDK11 is faster except five cases)
- [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`)
- [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type)
- [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases)
- [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS)
- [x] HashedRelationMetricsBenchmark (JDK11 is faster)
- [x] JSONBenchmark (JDK11 is much faster except eight cases)
- [x] JoinBenchmark (JDK11 is faster except five cases)
- [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases)
- [x] PrimitiveArrayBenchmark (JDK11 is faster)
- [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case)
- [x] UDFBenchmark (N/A, values are too small)
- [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case)
- [x] WideTableBenchmark (JDK11 is faster except two cases)

**B. CASES WE NEED TO INVESTIGATE MORE LATER**
- [x] AggregateBenchmark (JDK11 is slower in general)
- [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`)
- [x] DataSourceReadBenchmark (JDK11 is slower in general)
- [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`)
- [x] MakeDateTimeBenchmark (JDK11 is slower except two cases)
- [x] MiscBenchmark (JDK11 is slower except ten cases)
- [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower)
- [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases)
- [x] RangeBenchmark (JDK11 is slower except one case)

`FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer.

### Why are the changes needed?

According to the result, there are some difference between JDK8/JDK11.
This will be a baseline for the future improvement and comparison. Also, as a reproducible  environment, the following environment is used.
- Instance: `r3.xlarge`
- OS: `CentOS Linux release 7.5.1804 (Core)`
- JDK:
  - `OpenJDK Runtime Environment (build 1.8.0_222-b10)`
  - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)`

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This is a test-only PR. We need to run benchmark.

Closes #26003 from dongjoon-hyun/SPARK-29320.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 08:58:25 -07:00

47 lines
3.9 KiB
Plaintext

================================================================================================
Dataset Benchmark
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
back-to-back map long: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
RDD 12720 12777 80 7.9 127.2 1.0X
DataFrame 2242 2501 366 44.6 22.4 5.7X
Dataset 3040 3174 189 32.9 30.4 4.2X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
back-to-back map: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
RDD 15865 15922 82 6.3 158.6 1.0X
DataFrame 8423 8476 75 11.9 84.2 1.9X
Dataset 17180 18142 1361 5.8 171.8 0.9X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
back-to-back filter Long: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
RDD 2928 3009 114 34.1 29.3 1.0X
DataFrame 1386 1427 59 72.2 13.9 2.1X
Dataset 3448 3451 5 29.0 34.5 0.8X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
back-to-back filter: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
RDD 5476 5483 10 18.3 54.8 1.0X
DataFrame 209 235 23 479.1 2.1 26.2X
Dataset 9433 9549 163 10.6 94.3 0.6X
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
aggregate: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
RDD sum 5146 5239 132 19.4 51.5 1.0X
DataFrame sum 84 99 15 1196.9 0.8 61.6X
Dataset sum using Aggregator 8944 9021 109 11.2 89.4 0.6X
Dataset complex Aggregator 12832 13141 436 7.8 128.3 0.4X