6115a5e1a0
## What changes were proposed in this pull request? Added new benchmarks for: 1. JSON functions: `from_json`, `json_tuple` and `get_json_object` 2. Parsing `Dataset[String]` with JSON records 3. Comparing just splitting input text by lines with schema inferring, per-line parsing when encoding is set and not set. Also existing benchmarks were refactored to use the `NoOp` datasource to eliminate overhead of triggers like `.filter((_: Row) => true).count()`. ## How was this patch tested? By running `JSONBenchmark` locally. Closes #24252 from MaxGekk/json-benchmark-func. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
84 lines
6.6 KiB
Plaintext
84 lines
6.6 KiB
Plaintext
================================================================================================
|
|
Benchmark for performance of JSON parsing
|
|
================================================================================================
|
|
|
|
Preparing data for benchmarking ...
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
|
|
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
|
|
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 51280 51722 420 2.0 512.8 1.0X
|
|
UTF-8 is set 75009 77276 1963 1.3 750.1 0.7X
|
|
|
|
Preparing data for benchmarking ...
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
|
|
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
|
|
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 39675 39738 83 2.5 396.7 1.0X
|
|
UTF-8 is set 62755 64399 1436 1.6 627.5 0.6X
|
|
|
|
Preparing data for benchmarking ...
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
|
|
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
|
|
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 56429 56468 65 0.2 5642.9 1.0X
|
|
UTF-8 is set 81078 81454 374 0.1 8107.8 0.7X
|
|
|
|
Preparing data for benchmarking ...
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
|
|
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
|
|
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
No encoding 95329 95557 265 0.0 190658.2 1.0X
|
|
UTF-8 is set 102827 102967 166 0.0 205654.2 0.9X
|
|
|
|
Preparing data for benchmarking ...
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
|
|
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
|
|
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 10 columns 14102 14136 52 0.7 1410.2 1.0X
|
|
Select 1 column 17487 17537 51 0.6 1748.7 0.8X
|
|
|
|
Preparing data for benchmarking ...
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
|
|
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
|
|
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Short column without encoding 6013 6066 70 1.7 601.3 1.0X
|
|
Short column with UTF-8 8031 8079 45 1.2 803.1 0.7X
|
|
Wide column without encoding 107093 108539 NaN 0.1 10709.3 0.1X
|
|
Wide column with UTF-8 130983 132518 1346 0.1 13098.3 0.0X
|
|
|
|
Preparing data for benchmarking ...
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
|
|
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
|
|
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 939 950 11 10.6 93.9 1.0X
|
|
from_json 12924 12944 26 0.8 1292.4 0.1X
|
|
json_tuple 15312 15771 432 0.7 1531.2 0.1X
|
|
get_json_object 13049 13475 714 0.8 1304.9 0.1X
|
|
|
|
Preparing data for benchmarking ...
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
|
|
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
|
|
Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 4556 4630 108 11.0 91.1 1.0X
|
|
schema inferring 23624 24338 626 2.1 472.5 0.2X
|
|
parsing 22342 22420 81 2.2 446.8 0.2X
|
|
|
|
Preparing data for benchmarking ...
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
|
|
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
|
|
Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Text read 7537 7556 26 6.6 150.7 1.0X
|
|
Schema inferring 27875 28306 499 1.8 557.5 0.3X
|
|
Parsing without charset 26030 26083 67 1.9 520.6 0.3X
|
|
Parsing with UTF-8 37115 37480 392 1.3 742.3 0.2X
|
|
|