42f01e314b
### What changes were proposed in this pull request? Set the JSON option `inferTimestamp` to `true` for the cases that measure perf of timestamp inference. ### Why are the changes needed? The PR https://github.com/apache/spark/pull/28966 disabled timestamp inference by default. As a consequence, some benchmarks don't measure perf of timestamp inference from JSON fields. This PR explicitly enable such inference. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By re-generating results of `JsonBenchmark`. Closes #28981 from MaxGekk/json-inferTimestamps-disable-by-default-followup. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
113 lines
9.6 KiB
Plaintext
113 lines
9.6 KiB
Plaintext
================================================================================================
|
|
Benchmark for performance of JSON parsing
|
|
================================================================================================
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 73307 73400 141 1.4 733.1 1.0X
|
|
UTF-8 is set 143834 143925 152 0.7 1438.3 0.5X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 50894 51065 292 2.0 508.9 1.0X
|
|
UTF-8 is set 98462 99455 1173 1.0 984.6 0.5X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 64011 64969 1001 0.2 6401.1 1.0X
|
|
UTF-8 is set 102757 102984 311 0.1 10275.7 0.6X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 132559 133561 1010 0.0 265117.3 1.0X
|
|
UTF-8 is set 151458 152129 611 0.0 302915.4 0.9X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 10 columns 21148 21202 87 0.5 2114.8 1.0X
|
|
Select 1 column 24701 24724 21 0.4 2470.1 0.9X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Short column without encoding 6945 6998 59 1.4 694.5 1.0X
|
|
Short column with UTF-8 11510 11569 51 0.9 1151.0 0.6X
|
|
Wide column without encoding 95004 95795 790 0.1 9500.4 0.1X
|
|
Wide column with UTF-8 149223 149409 276 0.1 14922.3 0.0X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 649 652 3 15.4 64.9 1.0X
|
|
from_json 22284 22393 99 0.4 2228.4 0.0X
|
|
json_tuple 32310 32824 484 0.3 3231.0 0.0X
|
|
get_json_object 22111 22751 568 0.5 2211.1 0.0X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 2894 2903 8 17.3 57.9 1.0X
|
|
schema inferring 26724 26785 62 1.9 534.5 0.1X
|
|
parsing 37502 37632 131 1.3 750.0 0.1X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 10994 11010 16 4.5 219.9 1.0X
|
|
Schema inferring 45654 45677 37 1.1 913.1 0.2X
|
|
Parsing without charset 34476 34559 73 1.5 689.5 0.3X
|
|
Parsing with UTF-8 56987 57002 13 0.9 1139.7 0.2X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Create a dataset of timestamps 2150 2188 35 4.7 215.0 1.0X
|
|
to_json(timestamp) 17874 18080 294 0.6 1787.4 0.1X
|
|
write timestamps to files 12518 12538 34 0.8 1251.8 0.2X
|
|
Create a dataset of dates 2298 2310 18 4.4 229.8 0.9X
|
|
to_json(date) 11673 11703 27 0.9 1167.3 0.2X
|
|
write dates to files 7121 7135 12 1.4 712.1 0.3X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
read timestamp text from files 2616 2641 34 3.8 261.6 1.0X
|
|
read timestamps from files 37481 37517 58 0.3 3748.1 0.1X
|
|
infer timestamps from files 84774 84964 201 0.1 8477.4 0.0X
|
|
read date text from files 2362 2365 3 4.2 236.2 1.1X
|
|
read date from files 16583 16612 29 0.6 1658.3 0.2X
|
|
timestamp strings 3927 3963 40 2.5 392.7 0.7X
|
|
parse timestamps from Dataset[String] 52827 53004 243 0.2 5282.7 0.0X
|
|
infer timestamps from Dataset[String] 101108 101644 769 0.1 10110.8 0.0X
|
|
date strings 4886 4906 26 2.0 488.6 0.5X
|
|
parse dates from Dataset[String] 27623 27694 62 0.4 2762.3 0.1X
|
|
from_json(timestamp) 71764 71887 124 0.1 7176.4 0.0X
|
|
from_json(date) 46200 46314 99 0.2 4620.0 0.1X
|
|
|
|
|