spark-instrumented-optimizer/sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
Max Gekk bcf23307f4 [SPARK-32130][SQL] Disable the JSON option inferTimestamp by default
### What changes were proposed in this pull request?
Set the JSON option `inferTimestamp` to `false` if an user don't pass it as datasource option.

### Why are the changes needed?
To prevent perf regression while inferring schemas from JSON with potential timestamps fields.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
- Modified existing tests in `JsonSuite` and `JsonInferSchemaSuite`.
- Regenerated results of `JsonBenchmark` in the environment:

| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 |

Closes #28966 from MaxGekk/json-inferTimestamps-disable-by-default.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-01 15:45:39 -07:00

113 lines
9.6 KiB
Plaintext

================================================================================================
Benchmark for performance of JSON parsing
================================================================================================
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 69219 69342 116 1.4 692.2 1.0X
UTF-8 is set 143950 143986 55 0.7 1439.5 0.5X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 57828 57913 136 1.7 578.3 1.0X
UTF-8 is set 83649 83711 60 1.2 836.5 0.7X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 64560 65193 1023 0.2 6456.0 1.0X
UTF-8 is set 102925 103174 216 0.1 10292.5 0.6X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 131002 132316 1160 0.0 262003.1 1.0X
UTF-8 is set 152128 152371 332 0.0 304256.5 0.9X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Select 10 columns 19376 19514 160 0.5 1937.6 1.0X
Select 1 column 24089 24156 58 0.4 2408.9 0.8X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Short column without encoding 8131 8219 103 1.2 813.1 1.0X
Short column with UTF-8 13464 13508 44 0.7 1346.4 0.6X
Wide column without encoding 108012 108598 914 0.1 10801.2 0.1X
Wide column with UTF-8 150988 151369 412 0.1 15098.8 0.1X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 753 765 18 13.3 75.3 1.0X
from_json 23182 23446 230 0.4 2318.2 0.0X
json_tuple 31129 31304 181 0.3 3112.9 0.0X
get_json_object 22821 23073 225 0.4 2282.1 0.0X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 3078 3101 26 16.2 61.6 1.0X
schema inferring 30225 30434 333 1.7 604.5 0.1X
parsing 32237 32308 63 1.6 644.7 0.1X
Preparing data for benchmarking ...
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 10835 10900 86 4.6 216.7 1.0X
Schema inferring 37720 37805 110 1.3 754.4 0.3X
Parsing without charset 35464 35538 100 1.4 709.3 0.3X
Parsing with UTF-8 67311 67738 381 0.7 1346.2 0.2X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Create a dataset of timestamps 2208 2222 14 4.5 220.8 1.0X
to_json(timestamp) 14299 14570 285 0.7 1429.9 0.2X
write timestamps to files 12955 12969 13 0.8 1295.5 0.2X
Create a dataset of dates 2297 2323 30 4.4 229.7 1.0X
to_json(date) 8509 8561 74 1.2 850.9 0.3X
write dates to files 6786 6827 45 1.5 678.6 0.3X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read timestamp text from files 2598 2613 18 3.8 259.8 1.0X
read timestamps from files 42007 42028 19 0.2 4200.7 0.1X
infer timestamps from files 18102 18120 28 0.6 1810.2 0.1X
read date text from files 2355 2360 5 4.2 235.5 1.1X
read date from files 17420 17458 33 0.6 1742.0 0.1X
timestamp strings 3099 3101 3 3.2 309.9 0.8X
parse timestamps from Dataset[String] 48188 48215 25 0.2 4818.8 0.1X
infer timestamps from Dataset[String] 22929 22988 102 0.4 2292.9 0.1X
date strings 4090 4103 11 2.4 409.0 0.6X
parse dates from Dataset[String] 24952 25068 139 0.4 2495.2 0.1X
from_json(timestamp) 66038 66352 413 0.2 6603.8 0.0X
from_json(date) 43755 43782 27 0.2 4375.5 0.1X