spark-instrumented-optimizer/sql/core/benchmarks/JSONBenchmark-jdk11-results.txt
Dongjoon Hyun 854a0f752e [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1)
### What changes were proposed in this pull request?

This PR regenerates the `sql/core` benchmarks in JDK8/11 to compare the result. In general, we compare the ratio instead of the time. However, in this PR, the average time is compared. This PR should be considered as a rough comparison.

**A. EXPECTED CASES(JDK11 is faster in general)**
- [x] BloomFilterBenchmark (JDK11 is faster except one case)
- [x] BuiltInDataSourceWriteBenchmark (JDK11 is faster at CSV/ORC)
- [x] CSVBenchmark (JDK11 is faster except five cases)
- [x] ColumnarBatchBenchmark (JDK11 is faster at `boolean`/`string` and some cases in `int`/`array`)
- [x] DatasetBenchmark (JDK11 is faster with `string`, but is slower for `long` type)
- [x] ExternalAppendOnlyUnsafeRowArrayBenchmark (JDK11 is faster except two cases)
- [x] ExtractBenchmark (JDK11 is faster except HOUR/MINUTE/SECOND/MILLISECONDS/MICROSECONDS)
- [x] HashedRelationMetricsBenchmark (JDK11 is faster)
- [x] JSONBenchmark (JDK11 is much faster except eight cases)
- [x] JoinBenchmark (JDK11 is faster except five cases)
- [x] OrcNestedSchemaPruningBenchmark (JDK11 is faster in nine cases)
- [x] PrimitiveArrayBenchmark (JDK11 is faster)
- [x] SortBenchmark (JDK11 is faster except `Arrays.sort` case)
- [x] UDFBenchmark (N/A, values are too small)
- [x] UnsafeArrayDataBenchmark (JDK11 is faster except one case)
- [x] WideTableBenchmark (JDK11 is faster except two cases)

**B. CASES WE NEED TO INVESTIGATE MORE LATER**
- [x] AggregateBenchmark (JDK11 is slower in general)
- [x] CompressionSchemeBenchmark (JDK11 is slower in general except `string`)
- [x] DataSourceReadBenchmark (JDK11 is slower in general)
- [x] DateTimeBenchmark (JDK11 is slightly slower in general except `parsing`)
- [x] MakeDateTimeBenchmark (JDK11 is slower except two cases)
- [x] MiscBenchmark (JDK11 is slower except ten cases)
- [x] OrcV2NestedSchemaPruningBenchmark (JDK11 is slower)
- [x] ParquetNestedSchemaPruningBenchmark (JDK11 is slower except six cases)
- [x] RangeBenchmark (JDK11 is slower except one case)

`FilterPushdownBenchmark/InExpressionBenchmark/WideSchemaBenchmark` will be compared later because it took long timer.

### Why are the changes needed?

According to the result, there are some difference between JDK8/JDK11.
This will be a baseline for the future improvement and comparison. Also, as a reproducible  environment, the following environment is used.
- Instance: `r3.xlarge`
- OS: `CentOS Linux release 7.5.1804 (Core)`
- JDK:
  - `OpenJDK Runtime Environment (build 1.8.0_222-b10)`
  - `OpenJDK Runtime Environment 18.9 (build 11.0.4+11-LTS)`

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This is a test-only PR. We need to run benchmark.

Closes #26003 from dongjoon-hyun/SPARK-29320.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-10-03 08:58:25 -07:00

113 lines
9.5 KiB
Plaintext

================================================================================================
Benchmark for performance of JSON parsing
================================================================================================
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 70863 71147 417 1.4 708.6 1.0X
UTF-8 is set 133520 133658 131 0.7 1335.2 0.5X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 44462 44655 313 2.2 444.6 1.0X
UTF-8 is set 89342 89386 53 1.1 893.4 0.5X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 58617 59071 541 0.2 5861.7 1.0X
UTF-8 is set 102650 102948 493 0.1 10265.0 0.6X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 168197 168256 61 0.0 336393.0 1.0X
UTF-8 is set 191539 191724 186 0.0 383078.2 0.9X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Select 10 columns 18983 18998 22 0.5 1898.3 1.0X
Select 1 column 23544 23575 32 0.4 2354.4 0.8X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Short column without encoding 9336 9402 66 1.1 933.6 1.0X
Short column with UTF-8 13310 13870 489 0.8 1331.0 0.7X
Wide column without encoding 112821 113012 202 0.1 11282.1 0.1X
Wide column with UTF-8 156408 156910 776 0.1 15640.8 0.1X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 743 744 1 13.5 74.3 1.0X
from_json 22379 22566 279 0.4 2237.9 0.0X
json_tuple 32201 32271 89 0.3 3220.1 0.0X
get_json_object 21973 22405 375 0.5 2197.3 0.0X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 3771 3780 8 13.3 75.4 1.0X
schema inferring 34808 35213 506 1.4 696.2 0.1X
parsing 36917 38889 1757 1.4 738.3 0.1X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 11928 11955 38 4.2 238.6 1.0X
Schema inferring 44716 44921 220 1.1 894.3 0.3X
Parsing without charset 35749 35838 77 1.4 715.0 0.3X
Parsing with UTF-8 50521 50530 12 1.0 1010.4 0.2X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Create a dataset of timestamps 2473 2487 18 4.0 247.3 1.0X
to_json(timestamp) 16100 16284 236 0.6 1610.0 0.2X
write timestamps to files 13226 13852 676 0.8 1322.6 0.2X
Create a dataset of dates 2661 2680 16 3.8 266.1 0.9X
to_json(date) 10696 10730 55 0.9 1069.6 0.2X
write dates to files 8543 8926 384 1.2 854.3 0.3X
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read timestamp text from files 2822 2829 8 3.5 282.2 1.0X
read timestamps from files 34884 34900 23 0.3 3488.4 0.1X
infer timestamps from files 70310 71551 1079 0.1 7031.0 0.0X
read date text from files 2649 2655 6 3.8 264.9 1.1X
read date from files 22875 22904 33 0.4 2287.5 0.1X
timestamp strings 4025 4059 31 2.5 402.5 0.7X
parse timestamps from Dataset[String] 43040 43157 115 0.2 4304.0 0.1X
infer timestamps from Dataset[String] 81762 81927 144 0.1 8176.2 0.0X
date strings 4899 4913 21 2.0 489.9 0.6X
parse dates from Dataset[String] 33530 33560 27 0.3 3353.0 0.1X
from_json(timestamp) 60762 60877 142 0.2 6076.2 0.0X
from_json(date) 52482 52620 150 0.2 5248.2 0.1X