================================================================================================ Benchmark for performance of JSON parsing ================================================================================================ Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ No encoding 61888 61918 27 1.6 618.9 1.0X UTF-8 is set 109057 113663 NaN 0.9 1090.6 0.6X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ No encoding 44517 44535 29 2.2 445.2 1.0X UTF-8 is set 75722 75840 111 1.3 757.2 0.6X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ No encoding 63677 64090 633 0.2 6367.7 1.0X UTF-8 is set 99424 99615 185 0.1 9942.4 0.6X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ No encoding 174052 174251 174 0.0 348104.1 1.0X UTF-8 is set 189000 189098 113 0.0 378000.9 0.9X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Select 10 columns 18387 18473 142 0.5 1838.7 1.0X Select 1 column 25560 25571 13 0.4 2556.0 0.7X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Short column without encoding 9323 9384 58 1.1 932.3 1.0X Short column with UTF-8 14016 14058 55 0.7 1401.6 0.7X Wide column without encoding 133258 133532 382 0.1 13325.8 0.1X Wide column with UTF-8 181212 181283 61 0.1 18121.2 0.1X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Text read 1168 1174 5 8.6 116.8 1.0X from_json 22604 23571 883 0.4 2260.4 0.1X json_tuple 29979 30053 91 0.3 2997.9 0.0X get_json_object 21987 22263 241 0.5 2198.7 0.1X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Text read 5831 5842 14 8.6 116.6 1.0X schema inferring 31372 31456 73 1.6 627.4 0.2X parsing 35911 36191 254 1.4 718.2 0.2X Preparing data for benchmarking ... OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Text read 10249 10314 77 4.9 205.0 1.0X Schema inferring 35403 35436 40 1.4 708.1 0.3X Parsing without charset 32875 32879 4 1.5 657.5 0.3X Parsing with UTF-8 53444 53519 100 0.9 1068.9 0.2X OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Create a dataset of timestamps 1909 1924 17 5.2 190.9 1.0X to_json(timestamp) 18956 19122 208 0.5 1895.6 0.1X write timestamps to files 13446 13472 43 0.7 1344.6 0.1X Create a dataset of dates 2180 2200 28 4.6 218.0 0.9X to_json(date) 12780 12899 109 0.8 1278.0 0.1X write dates to files 7835 7865 29 1.3 783.5 0.2X OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ read timestamp text from files 2467 2477 9 4.1 246.7 1.0X read timestamps from files 40186 40342 135 0.2 4018.6 0.1X infer timestamps from files 82005 82079 71 0.1 8200.5 0.0X read date text from files 2243 2264 22 4.5 224.3 1.1X read date from files 24852 24863 19 0.4 2485.2 0.1X timestamp strings 3836 3854 16 2.6 383.6 0.6X parse timestamps from Dataset[String] 51521 51697 242 0.2 5152.1 0.0X infer timestamps from Dataset[String] 97300 97398 133 0.1 9730.0 0.0X date strings 4488 4491 5 2.2 448.8 0.5X parse dates from Dataset[String] 37918 37976 68 0.3 3791.8 0.1X from_json(timestamp) 69611 69632 36 0.1 6961.1 0.0X from_json(date) 56598 56974 347 0.2 5659.8 0.0X