spark-instrumented-optimizer/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt

462 lines
37 KiB
Plaintext
Raw Normal View History

================================================================================================
datetime +/- interval
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
datetime +/- interval: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date + interval(m) 1644 1661 24 6.1 164.4 1.0X
date + interval(m, d) 1571 1577 9 6.4 157.1 1.0X
date + interval(m, d, ms) 6292 6343 72 1.6 629.2 0.3X
date - interval(m) 1490 1493 5 6.7 149.0 1.1X
date - interval(m, d) 1551 1563 16 6.4 155.1 1.1X
date - interval(m, d, ms) 6432 6458 36 1.6 643.2 0.3X
timestamp + interval(m) 3198 3252 76 3.1 319.8 0.5X
timestamp + interval(m, d) 3421 3435 20 2.9 342.1 0.5X
timestamp + interval(m, d, ms) 3685 3715 43 2.7 368.5 0.4X
timestamp - interval(m) 3249 3269 28 3.1 324.9 0.5X
timestamp - interval(m, d) 3375 3383 12 3.0 337.5 0.5X
timestamp - interval(m, d, ms) 3696 3706 15 2.7 369.6 0.4X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
================================================================================================
Extract components
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
cast to timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
cast to timestamp wholestage off 279 293 20 35.8 27.9 1.0X
cast to timestamp wholestage on 274 286 9 36.4 27.4 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
year of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
year of timestamp wholestage off 1182 1184 3 8.5 118.2 1.0X
year of timestamp wholestage on 1239 1251 10 8.1 123.9 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
quarter of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
quarter of timestamp wholestage off 1238 1243 8 8.1 123.8 1.0X
quarter of timestamp wholestage on 1269 1291 14 7.9 126.9 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
month of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
month of timestamp wholestage off 1178 1185 10 8.5 117.8 1.0X
month of timestamp wholestage on 1211 1230 12 8.3 121.1 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
weekofyear of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
weekofyear of timestamp wholestage off 1734 1740 10 5.8 173.4 1.0X
weekofyear of timestamp wholestage on 1810 1826 16 5.5 181.0 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
day of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
day of timestamp wholestage off 1195 1199 4 8.4 119.5 1.0X
day of timestamp wholestage on 1214 1231 16 8.2 121.4 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
dayofyear of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
dayofyear of timestamp wholestage off 1283 1286 5 7.8 128.3 1.0X
dayofyear of timestamp wholestage on 1269 1286 11 7.9 126.9 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
dayofmonth of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
dayofmonth of timestamp wholestage off 1216 1239 32 8.2 121.6 1.0X
dayofmonth of timestamp wholestage on 1201 1262 73 8.3 120.1 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
dayofweek of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
dayofweek of timestamp wholestage off 1333 1343 14 7.5 133.3 1.0X
dayofweek of timestamp wholestage on 1380 1399 18 7.2 138.0 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
weekday of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
weekday of timestamp wholestage off 1294 1314 28 7.7 129.4 1.0X
weekday of timestamp wholestage on 1361 1379 12 7.3 136.1 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
hour of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
hour of timestamp wholestage off 946 1039 132 10.6 94.6 1.0X
hour of timestamp wholestage on 995 1026 18 10.1 99.5 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
minute of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
minute of timestamp wholestage off 958 979 28 10.4 95.8 1.0X
minute of timestamp wholestage on 952 964 12 10.5 95.2 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
second of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
second of timestamp wholestage off 947 977 42 10.6 94.7 1.0X
second of timestamp wholestage on 987 1029 62 10.1 98.7 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
================================================================================================
Current date and time
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
current_date: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
current_date wholestage off 254 254 1 39.4 25.4 1.0X
current_date wholestage on 268 275 7 37.3 26.8 0.9X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
current_timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
current_timestamp wholestage off 263 296 47 38.1 26.3 1.0X
current_timestamp wholestage on 274 338 99 36.5 27.4 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
================================================================================================
Date arithmetic
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
cast to date: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
cast to date wholestage off 1053 1068 20 9.5 105.3 1.0X
cast to date wholestage on 1133 1151 19 8.8 113.3 0.9X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
last_day: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
last_day wholestage off 1178 1179 1 8.5 117.8 1.0X
last_day wholestage on 1372 1390 19 7.3 137.2 0.9X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
next_day: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
next_day wholestage off 1114 1131 24 9.0 111.4 1.0X
next_day wholestage on 1129 1147 13 8.9 112.9 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_add: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_add wholestage off 1039 1041 3 9.6 103.9 1.0X
date_add wholestage on 1036 1055 12 9.7 103.6 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_sub: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_sub wholestage off 989 1010 29 10.1 98.9 1.0X
date_sub wholestage on 1033 1045 14 9.7 103.3 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
add_months: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
add_months wholestage off 1328 1410 116 7.5 132.8 1.0X
add_months wholestage on 1439 1478 39 7.0 143.9 0.9X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
================================================================================================
Formatting dates
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
format date: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
format date wholestage off 5802 5830 40 1.7 580.2 1.0X
format date wholestage on 5802 5837 40 1.7 580.2 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
================================================================================================
Formatting timestamps
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
from_unixtime: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
from_unixtime wholestage off 6634 6669 49 1.5 663.4 1.0X
from_unixtime wholestage on 6724 6806 51 1.5 672.4 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
================================================================================================
Convert timestamps
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
from_utc_timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
from_utc_timestamp wholestage off 1389 1405 23 7.2 138.9 1.0X
from_utc_timestamp wholestage on 1363 1381 16 7.3 136.3 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
to_utc_timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
to_utc_timestamp wholestage off 1828 1837 12 5.5 182.8 1.0X
to_utc_timestamp wholestage on 1818 1838 23 5.5 181.8 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
================================================================================================
Intervals
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
cast interval: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
cast interval wholestage off 288 289 1 34.7 28.8 1.0X
cast interval wholestage on 269 281 10 37.2 26.9 1.1X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
datediff: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
datediff wholestage off 2024 2028 5 4.9 202.4 1.0X
datediff wholestage on 2002 2044 32 5.0 200.2 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
months_between: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
months_between wholestage off 6378 6398 28 1.6 637.8 1.0X
months_between wholestage on 6400 6467 65 1.6 640.0 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
window: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
window wholestage off 1834 1837 5 0.5 1833.7 1.0X
window wholestage on 20763 21129 229 0.0 20763.3 0.1X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
================================================================================================
Truncation
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc YEAR: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc YEAR wholestage off 2786 2817 43 3.6 278.6 1.0X
date_trunc YEAR wholestage on 2805 2852 31 3.6 280.5 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc YYYY: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc YYYY wholestage off 2772 2813 58 3.6 277.2 1.0X
date_trunc YYYY wholestage on 2860 2894 39 3.5 286.0 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc YY: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc YY wholestage off 2902 2907 6 3.4 290.2 1.0X
date_trunc YY wholestage on 2801 2842 37 3.6 280.1 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc MON: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc MON wholestage off 2863 2876 18 3.5 286.3 1.0X
date_trunc MON wholestage on 2870 2905 22 3.5 287.0 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc MONTH: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc MONTH wholestage off 2847 2866 27 3.5 284.7 1.0X
date_trunc MONTH wholestage on 2799 2851 40 3.6 279.9 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc MM: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc MM wholestage off 2833 2857 33 3.5 283.3 1.0X
date_trunc MM wholestage on 2890 2907 20 3.5 289.0 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc DAY: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc DAY wholestage off 2331 2387 79 4.3 233.1 1.0X
date_trunc DAY wholestage on 2324 2371 51 4.3 232.4 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc DD: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc DD wholestage off 2314 2352 53 4.3 231.4 1.0X
date_trunc DD wholestage on 2351 2392 28 4.3 235.1 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc HOUR: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc HOUR wholestage off 2278 2319 58 4.4 227.8 1.0X
date_trunc HOUR wholestage on 2346 2373 32 4.3 234.6 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc MINUTE: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc MINUTE wholestage off 2326 2346 28 4.3 232.6 1.0X
date_trunc MINUTE wholestage on 2310 2348 33 4.3 231.0 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc SECOND: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc SECOND wholestage off 368 373 8 27.2 36.8 1.0X
date_trunc SECOND wholestage on 370 380 9 27.0 37.0 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc WEEK: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc WEEK wholestage off 2749 2773 35 3.6 274.9 1.0X
date_trunc WEEK wholestage on 2761 2835 49 3.6 276.1 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
date_trunc QUARTER: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
date_trunc QUARTER wholestage off 3513 3574 86 2.8 351.3 1.0X
date_trunc QUARTER wholestage on 3492 3544 38 2.9 349.2 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
trunc year: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
trunc year wholestage off 272 273 2 36.8 27.2 1.0X
trunc year wholestage on 314 323 6 31.9 31.4 0.9X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
trunc yyyy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
trunc yyyy wholestage off 270 272 4 37.1 27.0 1.0X
trunc yyyy wholestage on 315 325 8 31.8 31.5 0.9X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
trunc yy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
trunc yy wholestage off 273 275 3 36.7 27.3 1.0X
trunc yy wholestage on 324 327 4 30.9 32.4 0.8X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
trunc mon: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
trunc mon wholestage off 284 285 2 35.2 28.4 1.0X
trunc mon wholestage on 318 327 7 31.4 31.8 0.9X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
trunc month: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
trunc month wholestage off 272 274 3 36.8 27.2 1.0X
trunc month wholestage on 313 321 5 31.9 31.3 0.9X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
trunc mm: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
trunc mm wholestage off 270 276 8 37.0 27.0 1.0X
trunc mm wholestage on 313 320 7 31.9 31.3 0.9X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
================================================================================================
Parsing
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
to timestamp str: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
to timestamp str wholestage off 145 151 8 6.9 145.2 1.0X
to timestamp str wholestage on 147 158 19 6.8 146.5 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
to_timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
to_timestamp wholestage off 1205 1209 6 0.8 1205.2 1.0X
to_timestamp wholestage on 1217 1261 64 0.8 1217.1 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
to_unix_timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
to_unix_timestamp wholestage off 1438 1440 2 0.7 1438.0 1.0X
to_unix_timestamp wholestage on 1194 1207 12 0.8 1193.6 1.2X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
to date str: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
to date str wholestage off 192 199 10 5.2 191.6 1.0X
to date str wholestage on 192 202 16 5.2 191.9 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
to_date: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
to_date wholestage off 2021 2034 18 0.5 2021.5 1.0X
to_date wholestage on 1965 1976 15 0.5 1965.1 1.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
================================================================================================
Conversion from/to external types
================================================================================================
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
[SPARK-31439][SQL] Fix perf regression of fromJavaDate ### What changes were proposed in this pull request? In the PR, I propose to re-use optimized implementation of days rebase function `rebaseJulianToGregorianDays()` introduced by the PR #28067 in conversion of `java.sql.Date` values to Catalyst's `DATE` values. The function `fromJavaDate` in `DateTimeUtils` was re-written by taking the implementation from Spark 2.4, and by rebasing the final results via `rebaseJulianToGregorianDays()`. Also I updated `DateTimeBenchmark`, and added a benchmark for conversion from `java.sql.Date`. ### Why are the changes needed? The PR fixes the regression of parallelizing a collection of `java.sql.Date` values, and improves performance of converting external values to Catalyst's `DATE` values: - x4 on the master branch - 30% against Spark 2.4.6-SNAPSHOT Spark 2.4.6-SNAPSHOT: ``` To/from java.sql.Timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Date 614 655 43 8.1 122.8 1.0X ``` Before the changes: ``` To/from java.sql.Timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Date 1154 1206 46 4.3 230.9 1.0X ``` After: ``` To/from java.sql.Timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Date 427 434 7 11.7 85.3 1.0X ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? - By existing tests suites, in particular, `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, `DateFunctionsSuite`, `DateExpressionsSuite`. - Re-run `DateTimeBenchmark` in the environment: | Item | Description | | ---- | ----| | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 | Closes #28205 from MaxGekk/optimize-fromJavaDate. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-04-14 10:44:00 -04:00
To/from Java's date-time: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00
------------------------------------------------------------------------------------------------------------------------
[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-03 16:02:56 -04:00
From java.sql.Date 496 516 22 10.1 99.3 1.0X
From java.time.LocalDate 380 384 4 13.2 76.0 1.3X
Collect java.sql.Date 2476 2837 357 2.0 495.3 0.2X
Collect java.time.LocalDate 2281 2314 28 2.2 456.3 0.2X
From java.sql.Timestamp 268 289 30 18.7 53.6 1.9X
From java.time.Instant 335 351 24 14.9 67.1 1.5X
Collect longs 1981 2184 188 2.5 396.2 0.3X
Collect java.sql.Timestamp 2137 2320 171 2.3 427.4 0.2X
Collect java.time.Instant 2291 2412 155 2.2 458.1 0.2X
java.sql.Date to Hive string 9522 9998 674 0.5 1904.3 0.1X
java.time.LocalDate to Hive string 8841 9082 407 0.6 1768.2 0.1X
java.sql.Timestamp to Hive string 9385 9733 521 0.5 1876.9 0.1X
java.time.Instant to Hive string 11756 11927 252 0.4 2351.2 0.0X
[SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) ### What changes were proposed in this pull request? This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison. **A. EXPECTED CASES(JDK11 is faster in general)** - [x] BloomFilterBenchmark (JDK11 is faster except one case) - [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC) - [x] CSVBenchmark (JDK11 is faster except five cases) - [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`) - [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type) - [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases) - [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS) - [x] HashedRelationMetricsBenchmark (JDK11 is faster) - [x] JSONBenchmark (JDK11 is much faster except eight cases) - [x] JoinBenchmark (JDK11 is faster except five cases) - [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases) - [x] PrimitiveArrayBenchmark (JDK11 is faster) - [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case) - [x] UDFBenchmark (N/A, values are too small) - [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case) - [x] WideTableBenchmark (JDK11 is faster except two cases) **B. CASES WE NEED TO INVESTIGATE MORE LATER** - [x] AggregateBenchmark (JDK11 is slower in general) - [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`) - [x] DataSourceReadBenchmark (JDK11 is slower in general) - [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`) - [x] MakeDateTimeBenchmark (JDK11 is slower except two cases) - [x] MiscBenchmark (JDK11 is slower except ten cases) - [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower) - [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases) - [x] RangeBenchmark (JDK11 is slower except one case) `FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer. ### Why are the changes needed? According to the result, there are some difference between JDK8/JDK11. This will be a baseline for the future improvement and comparison. Also, as a reproducible environment, the following environment is used. - Instance: `r3.xlarge` - OS: `CentOS Linux release 7.5.1804 (Core)` - JDK: - `OpenJDK Runtime Environment (build 1.8.0_222-b10)` - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is a test-only PR. We need to run benchmark. Closes #26003 from dongjoon-hyun/SPARK-29320. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 11:58:25 -04:00