================================================================================================ Benchmark to measure CSV read/write performance ================================================================================================ OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parsing quoted values: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ One quoted string 45457 45731 344 0.0 909136.8 1.0X OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Select 1000 columns 129646 130527 1412 0.0 129646.3 1.0X Select 100 columns 42444 42551 119 0.0 42444.0 3.1X Select one column 35415 35428 20 0.0 35414.6 3.7X count() 11114 11128 16 0.1 11113.6 11.7X Select 100 columns, one bad input field 93353 93670 275 0.0 93352.6 1.4X Select 100 columns, corrupt record field 113569 113952 373 0.0 113568.8 1.1X OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Select 10 columns + count() 18498 18589 87 0.5 1849.8 1.0X Select 1 column + count() 11078 11095 27 0.9 1107.8 1.7X count() 3928 3950 22 2.5 392.8 4.7X OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Create a dataset of timestamps 1933 1940 11 5.2 193.3 1.0X to_csv(timestamp) 18078 18243 255 0.6 1807.8 0.1X write timestamps to files 12668 12786 134 0.8 1266.8 0.2X Create a dataset of dates 2196 2201 5 4.6 219.6 0.9X to_csv(date) 9583 9597 21 1.0 958.3 0.2X write dates to files 7091 7110 20 1.4 709.1 0.3X OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ read timestamp text from files 2166 2177 10 4.6 216.6 1.0X read timestamps from files 53212 53402 281 0.2 5321.2 0.0X infer timestamps from files 109788 110372 570 0.1 10978.8 0.0X read date text from files 1921 1929 8 5.2 192.1 1.1X read date from files 25470 25499 25 0.4 2547.0 0.1X infer date from files 27201 27342 134 0.4 2720.1 0.1X timestamp strings 3638 3653 19 2.7 363.8 0.6X parse timestamps from Dataset[String] 61894 62532 555 0.2 6189.4 0.0X infer timestamps from Dataset[String] 125171 125430 236 0.1 12517.1 0.0X date strings 3736 3749 14 2.7 373.6 0.6X parse dates from Dataset[String] 30787 30829 43 0.3 3078.7 0.1X from_csv(timestamp) 60842 61035 209 0.2 6084.2 0.0X from_csv(date) 30123 30196 95 0.3 3012.3 0.1X OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Filters pushdown: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ w/o filters 28985 29042 80 0.0 289852.9 1.0X pushdown disabled 29080 29146 58 0.0 290799.4 1.0X w/ filters 2072 2084 17 0.0 20722.3 14.0X