spark-instrumented-optimizer/sql/core/benchmarks/JSONBenchmark-results.txt
Maxim Gekk 6115a5e1a0 [SPARK-27327][SQL] New JSON benchmarks: functions, Dataset[String]
## What changes were proposed in this pull request?

Added new benchmarks for:
1. JSON functions: `from_json`, `json_tuple` and `get_json_object`
2. Parsing `Dataset[String]` with JSON records
3. Comparing just splitting input text by lines with schema inferring, per-line parsing when encoding is set and not set.

Also existing benchmarks were refactored to use the `NoOp` datasource to eliminate overhead of triggers like `.filter((_: Row) => true).count()`.

## How was this patch tested?

By running `JSONBenchmark` locally.

Closes #24252 from MaxGekk/json-benchmark-func.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2019-04-01 08:33:16 +09:00

84 lines
6.6 KiB
Plaintext

================================================================================================
Benchmark for performance of JSON parsing
================================================================================================
Preparing data for benchmarking ...
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 51280 51722 420 2.0 512.8 1.0X
UTF-8 is set 75009 77276 1963 1.3 750.1 0.7X
Preparing data for benchmarking ...
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 39675 39738 83 2.5 396.7 1.0X
UTF-8 is set 62755 64399 1436 1.6 627.5 0.6X
Preparing data for benchmarking ...
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 56429 56468 65 0.2 5642.9 1.0X
UTF-8 is set 81078 81454 374 0.1 8107.8 0.7X
Preparing data for benchmarking ...
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
No encoding 95329 95557 265 0.0 190658.2 1.0X
UTF-8 is set 102827 102967 166 0.0 205654.2 0.9X
Preparing data for benchmarking ...
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Select 10 columns 14102 14136 52 0.7 1410.2 1.0X
Select 1 column 17487 17537 51 0.6 1748.7 0.8X
Preparing data for benchmarking ...
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Short column without encoding 6013 6066 70 1.7 601.3 1.0X
Short column with UTF-8 8031 8079 45 1.2 803.1 0.7X
Wide column without encoding 107093 108539 NaN 0.1 10709.3 0.1X
Wide column with UTF-8 130983 132518 1346 0.1 13098.3 0.0X
Preparing data for benchmarking ...
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 939 950 11 10.6 93.9 1.0X
from_json 12924 12944 26 0.8 1292.4 0.1X
json_tuple 15312 15771 432 0.7 1531.2 0.1X
get_json_object 13049 13475 714 0.8 1304.9 0.1X
Preparing data for benchmarking ...
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 4556 4630 108 11.0 91.1 1.0X
schema inferring 23624 24338 626 2.1 472.5 0.2X
parsing 22342 22420 81 2.2 446.8 0.2X
Preparing data for benchmarking ...
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Text read 7537 7556 26 6.6 150.7 1.0X
Schema inferring 27875 28306 499 1.8 557.5 0.3X
Parsing without charset 26030 26083 67 1.9 520.6 0.3X
Parsing with UTF-8 37115 37480 392 1.3 742.3 0.2X