================================================================================================ Benchmark to measure CSV read/write performance ================================================================================================ Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.4 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Parsing quoted values: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ One quoted string 24073 24109 33 0.0 481463.5 1.0X Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.4 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Select 1000 columns 58415 59611 2071 0.0 58414.8 1.0X Select 100 columns 22568 23020 594 0.0 22568.0 2.6X Select one column 18995 19058 99 0.1 18995.0 3.1X count() 5301 5332 30 0.2 5300.9 11.0X Select 100 columns, one bad input field 39736 40153 361 0.0 39736.1 1.5X Select 100 columns, corrupt record field 47195 47826 590 0.0 47195.2 1.2X Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.4 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Select 10 columns + count() 9884 9904 25 1.0 988.4 1.0X Select 1 column + count() 6794 6835 46 1.5 679.4 1.5X count() 2060 2065 5 4.9 206.0 4.8X Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.4 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Create a dataset of timestamps 717 732 18 14.0 71.7 1.0X to_csv(timestamp) 6994 7100 121 1.4 699.4 0.1X write timestamps to files 6417 6435 27 1.6 641.7 0.1X Create a dataset of dates 827 855 24 12.1 82.7 0.9X to_csv(date) 4408 4438 32 2.3 440.8 0.2X write dates to files 3738 3758 28 2.7 373.8 0.2X Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.4 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ read timestamp text from files 1121 1176 52 8.9 112.1 1.0X read timestamps from files 21298 21366 105 0.5 2129.8 0.1X infer timestamps from files 41008 41051 39 0.2 4100.8 0.0X read date text from files 962 967 5 10.4 96.2 1.2X read date from files 11749 11772 22 0.9 1174.9 0.1X infer date from files 12426 12459 29 0.8 1242.6 0.1X timestamp strings 1508 1519 9 6.6 150.8 0.7X parse timestamps from Dataset[String] 21674 21997 455 0.5 2167.4 0.1X infer timestamps from Dataset[String] 42141 42230 105 0.2 4214.1 0.0X date strings 1694 1701 8 5.9 169.4 0.7X parse dates from Dataset[String] 12929 12951 25 0.8 1292.9 0.1X from_csv(timestamp) 20603 20786 166 0.5 2060.3 0.1X from_csv(date) 12325 12338 12 0.8 1232.5 0.1X Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.4 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Filters pushdown: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ w/o filters 12455 12474 22 0.0 124553.8 1.0X pushdown disabled 12462 12486 29 0.0 124624.9 1.0X w/ filters 1073 1092 18 0.1 10727.6 11.6X