spark-instrumented-optimizer/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
Max Gekk 42f01e314b [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark
### What changes were proposed in this pull request?
Set the JSON option `inferTimestamp` to `true` for the cases that measure perf of timestamp inference.

### Why are the changes needed?
The PR https://github.com/apache/spark/pull/28966 disabled timestamp inference by default. As a consequence, some benchmarks don't measure perf of timestamp inference from JSON fields. This PR explicitly enable such inference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By re-generating results of `JsonBenchmark`.

Closes #28981 from MaxGekk/json-inferTimestamps-disable-by-default-followup.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-02 13:26:57 -07:00

113 lines
9.6 KiB
Plaintext

================================================================================================
Benchmark for performance of JSON parsing
================================================================================================
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 73307 73400 141 1.4 733.1 1.0X
UTF-8 is set 143834 143925 152 0.7 1438.3 0.5X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 50894 51065 292 2.0 508.9 1.0X
UTF-8 is set 98462 99455 1173 1.0 984.6 0.5X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 64011 64969 1001 0.2 6401.1 1.0X
UTF-8 is set 102757 102984 311 0.1 10275.7 0.6X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 132559 133561 1010 0.0 265117.3 1.0X
UTF-8 is set 151458 152129 611 0.0 302915.4 0.9X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Select 10 columns 21148 21202 87 0.5 2114.8 1.0X
Select 1 column 24701 24724 21 0.4 2470.1 0.9X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Short column without encoding 6945 6998 59 1.4 694.5 1.0X
Short column with UTF-8 11510 11569 51 0.9 1151.0 0.6X
Wide column without encoding 95004 95795 790 0.1 9500.4 0.1X
Wide column with UTF-8 149223 149409 276 0.1 14922.3 0.0X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 649 652 3 15.4 64.9 1.0X
from_json 22284 22393 99 0.4 2228.4 0.0X
json_tuple 32310 32824 484 0.3 3231.0 0.0X
get_json_object 22111 22751 568 0.5 2211.1 0.0X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 2894 2903 8 17.3 57.9 1.0X
schema inferring 26724 26785 62 1.9 534.5 0.1X
parsing 37502 37632 131 1.3 750.0 0.1X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 10994 11010 16 4.5 219.9 1.0X
Schema inferring 45654 45677 37 1.1 913.1 0.2X
Parsing without charset 34476 34559 73 1.5 689.5 0.3X
Parsing with UTF-8 56987 57002 13 0.9 1139.7 0.2X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Create a dataset of timestamps 2150 2188 35 4.7 215.0 1.0X
to_json(timestamp) 17874 18080 294 0.6 1787.4 0.1X
write timestamps to files 12518 12538 34 0.8 1251.8 0.2X
Create a dataset of dates 2298 2310 18 4.4 229.8 0.9X
to_json(date) 11673 11703 27 0.9 1167.3 0.2X
write dates to files 7121 7135 12 1.4 712.1 0.3X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read timestamp text from files 2616 2641 34 3.8 261.6 1.0X
read timestamps from files 37481 37517 58 0.3 3748.1 0.1X
infer timestamps from files 84774 84964 201 0.1 8477.4 0.0X
read date text from files 2362 2365 3 4.2 236.2 1.1X
read date from files 16583 16612 29 0.6 1658.3 0.2X
timestamp strings 3927 3963 40 2.5 392.7 0.7X
parse timestamps from Dataset[String] 52827 53004 243 0.2 5282.7 0.0X
infer timestamps from Dataset[String] 101108 101644 769 0.1 10110.8 0.0X
date strings 4886 4906 26 2.0 488.6 0.5X
parse dates from Dataset[String] 27623 27694 62 0.4 2762.3 0.1X
from_json(timestamp) 71764 71887 124 0.1 7176.4 0.0X
from_json(date) 46200 46314 99 0.2 4620.0 0.1X