92685c0148
### What changes were proposed in this pull request? Re-generate results of: - DateTimeBenchmark - CSVBenchmark - JsonBenchmark in the environment: | Item | Description | | ---- | ----| | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 | ### Why are the changes needed? 1. The PR https://github.com/apache/spark/pull/28576 changed date-time parser. The `DateTimeBenchmark` should confirm that the PR didn't slow down date/timestamp parsing. 2. CSV/JSON datasources are affected by the above PR too. This PR updates the benchmark results in the same environment as other benchmarks to have a base line for future optimizations. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running benchmarks via the script: ```python #!/usr/bin/env python3 import os from sparktestsupport.shellutils import run_cmd benchmarks = [ ['sql/test', 'org.apache.spark.sql.execution.benchmark.DateTimeBenchmark'], ['sql/test', 'org.apache.spark.sql.execution.datasources.csv.CSVBenchmark'], ['sql/test', 'org.apache.spark.sql.execution.datasources.json.JsonBenchmark'] ] print('Set SPARK_GENERATE_BENCHMARK_FILES=1') os.environ['SPARK_GENERATE_BENCHMARK_FILES'] = '1' for b in benchmarks: print("Run benchmark: %s" % b[1]) run_cmd(['build/sbt', '%s:runMain %s' % (b[0], b[1])]) ``` Closes #28613 from MaxGekk/missing-hour-year-benchmarks. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
113 lines
9.6 KiB
Plaintext
113 lines
9.6 KiB
Plaintext
================================================================================================
|
|
Benchmark for performance of JSON parsing
|
|
================================================================================================
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 63981 64044 56 1.6 639.8 1.0X
|
|
UTF-8 is set 112672 113350 962 0.9 1126.7 0.6X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 51256 51449 180 2.0 512.6 1.0X
|
|
UTF-8 is set 83694 83859 148 1.2 836.9 0.6X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 58440 59097 569 0.2 5844.0 1.0X
|
|
UTF-8 is set 102746 102883 198 0.1 10274.6 0.6X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 128982 129304 356 0.0 257965.0 1.0X
|
|
UTF-8 is set 147247 147415 231 0.0 294494.1 0.9X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 10 columns 18837 19048 331 0.5 1883.7 1.0X
|
|
Select 1 column 24707 24723 14 0.4 2470.7 0.8X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Short column without encoding 8218 8234 17 1.2 821.8 1.0X
|
|
Short column with UTF-8 12374 12438 107 0.8 1237.4 0.7X
|
|
Wide column without encoding 136918 137298 345 0.1 13691.8 0.1X
|
|
Wide column with UTF-8 176961 177142 257 0.1 17696.1 0.0X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 1268 1278 12 7.9 126.8 1.0X
|
|
from_json 23348 23479 176 0.4 2334.8 0.1X
|
|
json_tuple 29606 30221 1024 0.3 2960.6 0.0X
|
|
get_json_object 21898 22148 226 0.5 2189.8 0.1X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 5887 5944 49 8.5 117.7 1.0X
|
|
schema inferring 46696 47054 312 1.1 933.9 0.1X
|
|
parsing 32336 32450 129 1.5 646.7 0.2X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 9756 9769 11 5.1 195.1 1.0X
|
|
Schema inferring 51318 51433 108 1.0 1026.4 0.2X
|
|
Parsing without charset 43609 43743 118 1.1 872.2 0.2X
|
|
Parsing with UTF-8 60775 60844 106 0.8 1215.5 0.2X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Create a dataset of timestamps 1998 2015 17 5.0 199.8 1.0X
|
|
to_json(timestamp) 18156 18317 263 0.6 1815.6 0.1X
|
|
write timestamps to files 12912 12917 5 0.8 1291.2 0.2X
|
|
Create a dataset of dates 2209 2270 53 4.5 220.9 0.9X
|
|
to_json(date) 9433 9489 90 1.1 943.3 0.2X
|
|
write dates to files 6915 6923 8 1.4 691.5 0.3X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
read timestamp text from files 2395 2412 17 4.2 239.5 1.0X
|
|
read timestamps from files 47269 47334 89 0.2 4726.9 0.1X
|
|
infer timestamps from files 91806 91851 67 0.1 9180.6 0.0X
|
|
read date text from files 2118 2133 13 4.7 211.8 1.1X
|
|
read date from files 17267 17340 115 0.6 1726.7 0.1X
|
|
timestamp strings 3906 3935 26 2.6 390.6 0.6X
|
|
parse timestamps from Dataset[String] 52244 52534 279 0.2 5224.4 0.0X
|
|
infer timestamps from Dataset[String] 100488 100714 198 0.1 10048.8 0.0X
|
|
date strings 4572 4584 12 2.2 457.2 0.5X
|
|
parse dates from Dataset[String] 26749 26768 17 0.4 2674.9 0.1X
|
|
from_json(timestamp) 71414 71867 556 0.1 7141.4 0.0X
|
|
from_json(date) 45322 45549 250 0.2 4532.2 0.1X
|
|
|
|
|