2019-10-03 11:58:25 -04:00
|
|
|
================================================================================================
|
|
|
|
Benchmark to measure CSV read/write performance
|
|
|
|
================================================================================================
|
|
|
|
|
2020-05-25 11:00:11 -04:00
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
2019-10-03 11:58:25 -04:00
|
|
|
Parsing quoted values: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
|
|
------------------------------------------------------------------------------------------------------------------------
|
2020-05-25 11:00:11 -04:00
|
|
|
One quoted string 46568 46683 198 0.0 931358.6 1.0X
|
2019-10-03 11:58:25 -04:00
|
|
|
|
2020-05-25 11:00:11 -04:00
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
2019-10-03 11:58:25 -04:00
|
|
|
Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
|
|
------------------------------------------------------------------------------------------------------------------------
|
2020-05-25 11:00:11 -04:00
|
|
|
Select 1000 columns 129836 130796 1404 0.0 129836.0 1.0X
|
|
|
|
Select 100 columns 40444 40679 261 0.0 40443.5 3.2X
|
|
|
|
Select one column 33429 33475 73 0.0 33428.6 3.9X
|
|
|
|
count() 7967 8047 73 0.1 7966.7 16.3X
|
|
|
|
Select 100 columns, one bad input field 90639 90832 266 0.0 90638.6 1.4X
|
|
|
|
Select 100 columns, corrupt record field 109023 109084 74 0.0 109023.3 1.2X
|
2019-10-03 11:58:25 -04:00
|
|
|
|
2020-05-25 11:00:11 -04:00
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
2019-10-03 11:58:25 -04:00
|
|
|
Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
|
|
------------------------------------------------------------------------------------------------------------------------
|
2020-05-25 11:00:11 -04:00
|
|
|
Select 10 columns + count() 20685 20707 35 0.5 2068.5 1.0X
|
|
|
|
Select 1 column + count() 13096 13149 49 0.8 1309.6 1.6X
|
|
|
|
count() 3994 4001 7 2.5 399.4 5.2X
|
2019-10-03 11:58:25 -04:00
|
|
|
|
2020-05-25 11:00:11 -04:00
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
2019-10-03 11:58:25 -04:00
|
|
|
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
|
|
------------------------------------------------------------------------------------------------------------------------
|
2020-05-25 11:00:11 -04:00
|
|
|
Create a dataset of timestamps 2169 2203 32 4.6 216.9 1.0X
|
|
|
|
to_csv(timestamp) 14401 14591 168 0.7 1440.1 0.2X
|
|
|
|
write timestamps to files 13209 13276 59 0.8 1320.9 0.2X
|
|
|
|
Create a dataset of dates 2231 2248 17 4.5 223.1 1.0X
|
|
|
|
to_csv(date) 10406 10473 68 1.0 1040.6 0.2X
|
|
|
|
write dates to files 7970 7976 9 1.3 797.0 0.3X
|
2019-10-03 11:58:25 -04:00
|
|
|
|
2020-05-25 11:00:11 -04:00
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
2019-10-03 11:58:25 -04:00
|
|
|
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
|
|
------------------------------------------------------------------------------------------------------------------------
|
2020-05-25 11:00:11 -04:00
|
|
|
read timestamp text from files 2387 2391 6 4.2 238.7 1.0X
|
|
|
|
read timestamps from files 53503 53593 124 0.2 5350.3 0.0X
|
|
|
|
infer timestamps from files 107988 108668 647 0.1 10798.8 0.0X
|
|
|
|
read date text from files 2121 2133 12 4.7 212.1 1.1X
|
|
|
|
read date from files 29983 30039 48 0.3 2998.3 0.1X
|
|
|
|
infer date from files 30196 30436 218 0.3 3019.6 0.1X
|
|
|
|
timestamp strings 3098 3109 10 3.2 309.8 0.8X
|
|
|
|
parse timestamps from Dataset[String] 63331 63426 84 0.2 6333.1 0.0X
|
|
|
|
infer timestamps from Dataset[String] 124003 124463 490 0.1 12400.3 0.0X
|
|
|
|
date strings 3423 3429 11 2.9 342.3 0.7X
|
|
|
|
parse dates from Dataset[String] 34235 34314 76 0.3 3423.5 0.1X
|
|
|
|
from_csv(timestamp) 60829 61600 668 0.2 6082.9 0.0X
|
|
|
|
from_csv(date) 33047 33173 139 0.3 3304.7 0.1X
|
2019-10-03 11:58:25 -04:00
|
|
|
|
2020-05-25 11:00:11 -04:00
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
|
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
2020-01-15 23:10:08 -05:00
|
|
|
Filters pushdown: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
|
|
------------------------------------------------------------------------------------------------------------------------
|
2020-05-25 11:00:11 -04:00
|
|
|
w/o filters 28752 28765 16 0.0 287516.5 1.0X
|
|
|
|
pushdown disabled 28856 28880 22 0.0 288556.3 1.0X
|
|
|
|
w/ filters 1714 1731 15 0.1 17137.3 16.8X
|
2020-01-15 23:10:08 -05:00
|
|
|
|
2019-10-03 11:58:25 -04:00
|
|
|
|