bcf23307f4
### What changes were proposed in this pull request? Set the JSON option `inferTimestamp` to `false` if an user don't pass it as datasource option. ### Why are the changes needed? To prevent perf regression while inferring schemas from JSON with potential timestamps fields. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? - Modified existing tests in `JsonSuite` and `JsonInferSchemaSuite`. - Regenerated results of `JsonBenchmark` in the environment: | Item | Description | | ---- | ----| | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 | Closes #28966 from MaxGekk/json-inferTimestamps-disable-by-default. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
113 lines
9.6 KiB
Plaintext
113 lines
9.6 KiB
Plaintext
================================================================================================
|
|
Benchmark for performance of JSON parsing
|
|
================================================================================================
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 69219 69342 116 1.4 692.2 1.0X
|
|
UTF-8 is set 143950 143986 55 0.7 1439.5 0.5X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 57828 57913 136 1.7 578.3 1.0X
|
|
UTF-8 is set 83649 83711 60 1.2 836.5 0.7X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 64560 65193 1023 0.2 6456.0 1.0X
|
|
UTF-8 is set 102925 103174 216 0.1 10292.5 0.6X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 131002 132316 1160 0.0 262003.1 1.0X
|
|
UTF-8 is set 152128 152371 332 0.0 304256.5 0.9X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 10 columns 19376 19514 160 0.5 1937.6 1.0X
|
|
Select 1 column 24089 24156 58 0.4 2408.9 0.8X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Short column without encoding 8131 8219 103 1.2 813.1 1.0X
|
|
Short column with UTF-8 13464 13508 44 0.7 1346.4 0.6X
|
|
Wide column without encoding 108012 108598 914 0.1 10801.2 0.1X
|
|
Wide column with UTF-8 150988 151369 412 0.1 15098.8 0.1X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 753 765 18 13.3 75.3 1.0X
|
|
from_json 23182 23446 230 0.4 2318.2 0.0X
|
|
json_tuple 31129 31304 181 0.3 3112.9 0.0X
|
|
get_json_object 22821 23073 225 0.4 2282.1 0.0X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 3078 3101 26 16.2 61.6 1.0X
|
|
schema inferring 30225 30434 333 1.7 604.5 0.1X
|
|
parsing 32237 32308 63 1.6 644.7 0.1X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 10835 10900 86 4.6 216.7 1.0X
|
|
Schema inferring 37720 37805 110 1.3 754.4 0.3X
|
|
Parsing without charset 35464 35538 100 1.4 709.3 0.3X
|
|
Parsing with UTF-8 67311 67738 381 0.7 1346.2 0.2X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Create a dataset of timestamps 2208 2222 14 4.5 220.8 1.0X
|
|
to_json(timestamp) 14299 14570 285 0.7 1429.9 0.2X
|
|
write timestamps to files 12955 12969 13 0.8 1295.5 0.2X
|
|
Create a dataset of dates 2297 2323 30 4.4 229.7 1.0X
|
|
to_json(date) 8509 8561 74 1.2 850.9 0.3X
|
|
write dates to files 6786 6827 45 1.5 678.6 0.3X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
read timestamp text from files 2598 2613 18 3.8 259.8 1.0X
|
|
read timestamps from files 42007 42028 19 0.2 4200.7 0.1X
|
|
infer timestamps from files 18102 18120 28 0.6 1810.2 0.1X
|
|
read date text from files 2355 2360 5 4.2 235.5 1.1X
|
|
read date from files 17420 17458 33 0.6 1742.0 0.1X
|
|
timestamp strings 3099 3101 3 3.2 309.9 0.8X
|
|
parse timestamps from Dataset[String] 48188 48215 25 0.2 4818.8 0.1X
|
|
infer timestamps from Dataset[String] 22929 22988 102 0.4 2292.9 0.1X
|
|
date strings 4090 4103 11 2.4 409.0 0.6X
|
|
parse dates from Dataset[String] 24952 25068 139 0.4 2495.2 0.1X
|
|
from_json(timestamp) 66038 66352 413 0.2 6603.8 0.0X
|
|
from_json(date) 43755 43782 27 0.2 4375.5 0.1X
|
|
|
|
|