================================================================================================ Benchmark to measure CSV read/write performance ================================================================================================ OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parsing quoted values: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ One quoted string 56894 57106 184 0.0 1137889.9 1.0X OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Select 1000 columns 220825 222234 2018 0.0 220825.5 1.0X Select 100 columns 50507 50723 278 0.0 50506.6 4.4X Select one column 38629 38642 16 0.0 38628.6 5.7X count() 8549 8597 51 0.1 8549.2 25.8X Select 100 columns, one bad input field 68309 68474 182 0.0 68309.2 3.2X Select 100 columns, corrupt record field 74551 74701 136 0.0 74551.5 3.0X OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Select 10 columns + count() 27745 28050 276 0.4 2774.5 1.0X Select 1 column + count() 19989 20315 319 0.5 1998.9 1.4X count() 6091 6109 25 1.6 609.1 4.6X OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Create a dataset of timestamps 2235 2301 59 4.5 223.5 1.0X to_csv(timestamp) 16033 16205 153 0.6 1603.3 0.1X write timestamps to files 13556 13685 167 0.7 1355.6 0.2X Create a dataset of dates 2262 2290 44 4.4 226.2 1.0X to_csv(date) 11122 11160 33 0.9 1112.2 0.2X write dates to files 8436 8486 76 1.2 843.6 0.3X OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ read timestamp text from files 2617 2644 26 3.8 261.7 1.0X read timestamps from files 53245 53381 149 0.2 5324.5 0.0X infer timestamps from files 103797 104026 257 0.1 10379.7 0.0X read date text from files 2371 2378 7 4.2 237.1 1.1X read date from files 41808 41929 177 0.2 4180.8 0.1X infer date from files 35069 35336 458 0.3 3506.9 0.1X timestamp strings 3104 3127 21 3.2 310.4 0.8X parse timestamps from Dataset[String] 61888 62132 342 0.2 6188.8 0.0X infer timestamps from Dataset[String] 112494 114609 1949 0.1 11249.4 0.0X date strings 3558 3603 41 2.8 355.8 0.7X parse dates from Dataset[String] 45871 46000 120 0.2 4587.1 0.1X from_csv(timestamp) 56975 57035 53 0.2 5697.5 0.0X from_csv(date) 43711 43795 74 0.2 4371.1 0.1X