================================================================================================ Benchmark for performance of JSON parsing ================================================================================================ Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ No encoding 63981 64044 56 1.6 639.8 1.0X UTF-8 is set 112672 113350 962 0.9 1126.7 0.6X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ No encoding 51256 51449 180 2.0 512.6 1.0X UTF-8 is set 83694 83859 148 1.2 836.9 0.6X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ No encoding 58440 59097 569 0.2 5844.0 1.0X UTF-8 is set 102746 102883 198 0.1 10274.6 0.6X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ No encoding 128982 129304 356 0.0 257965.0 1.0X UTF-8 is set 147247 147415 231 0.0 294494.1 0.9X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Select 10 columns 18837 19048 331 0.5 1883.7 1.0X Select 1 column 24707 24723 14 0.4 2470.7 0.8X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Short column without encoding 8218 8234 17 1.2 821.8 1.0X Short column with UTF-8 12374 12438 107 0.8 1237.4 0.7X Wide column without encoding 136918 137298 345 0.1 13691.8 0.1X Wide column with UTF-8 176961 177142 257 0.1 17696.1 0.0X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Text read 1268 1278 12 7.9 126.8 1.0X from_json 23348 23479 176 0.4 2334.8 0.1X json_tuple 29606 30221 1024 0.3 2960.6 0.0X get_json_object 21898 22148 226 0.5 2189.8 0.1X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Text read 5887 5944 49 8.5 117.7 1.0X schema inferring 46696 47054 312 1.1 933.9 0.1X parsing 32336 32450 129 1.5 646.7 0.2X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Text read 9756 9769 11 5.1 195.1 1.0X Schema inferring 51318 51433 108 1.0 1026.4 0.2X Parsing without charset 43609 43743 118 1.1 872.2 0.2X Parsing with UTF-8 60775 60844 106 0.8 1215.5 0.2X OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Create a dataset of timestamps 1998 2015 17 5.0 199.8 1.0X to_json(timestamp) 18156 18317 263 0.6 1815.6 0.1X write timestamps to files 12912 12917 5 0.8 1291.2 0.2X Create a dataset of dates 2209 2270 53 4.5 220.9 0.9X to_json(date) 9433 9489 90 1.1 943.3 0.2X write dates to files 6915 6923 8 1.4 691.5 0.3X OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ read timestamp text from files 2395 2412 17 4.2 239.5 1.0X read timestamps from files 47269 47334 89 0.2 4726.9 0.1X infer timestamps from files 91806 91851 67 0.1 9180.6 0.0X read date text from files 2118 2133 13 4.7 211.8 1.1X read date from files 17267 17340 115 0.6 1726.7 0.1X timestamp strings 3906 3935 26 2.6 390.6 0.6X parse timestamps from Dataset[String] 52244 52534 279 0.2 5224.4 0.0X infer timestamps from Dataset[String] 100488 100714 198 0.1 10048.8 0.0X date strings 4572 4584 12 2.2 457.2 0.5X parse dates from Dataset[String] 26749 26768 17 0.4 2674.9 0.1X from_json(timestamp) 71414 71867 556 0.1 7141.4 0.0X from_json(date) 45322 45549 250 0.2 4532.2 0.1X