ec70467d4d
### What changes were proposed in this pull request? This PR updates CSVBenchmark especially we have a fix like https://github.com/apache/spark/pull/31858 that could potentially improve the performance. ### Why are the changes needed? To have the updated benchmark results. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually ran the benchmark Closes #31917 from HyukjinKwon/SPARK-34815. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
68 lines
6.1 KiB
Plaintext
68 lines
6.1 KiB
Plaintext
================================================================================================
|
|
Benchmark to measure CSV read/write performance
|
|
================================================================================================
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Parsing quoted values: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
One quoted string 24185 24195 10 0.0 483694.2 1.0X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 1000 columns 61793 62388 532 0.0 61793.4 1.0X
|
|
Select 100 columns 21958 21993 34 0.0 21957.9 2.8X
|
|
Select one column 18215 18515 505 0.1 18215.0 3.4X
|
|
count() 5865 6168 296 0.2 5865.1 10.5X
|
|
Select 100 columns, one bad input field 39638 39739 124 0.0 39637.5 1.6X
|
|
Select 100 columns, corrupt record field 47290 48133 741 0.0 47290.0 1.3X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 10 columns + count() 9935 10460 461 1.0 993.5 1.0X
|
|
Select 1 column + count() 6786 7179 342 1.5 678.6 1.5X
|
|
count() 2281 2458 165 4.4 228.1 4.4X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Create a dataset of timestamps 812 826 14 12.3 81.2 1.0X
|
|
to_csv(timestamp) 7548 7764 192 1.3 754.8 0.1X
|
|
write timestamps to files 7052 7193 141 1.4 705.2 0.1X
|
|
Create a dataset of dates 897 909 13 11.1 89.7 0.9X
|
|
to_csv(date) 4778 4787 10 2.1 477.8 0.2X
|
|
write dates to files 3853 3891 33 2.6 385.3 0.2X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
read timestamp text from files 1259 1262 4 7.9 125.9 1.0X
|
|
read timestamps from files 20030 20105 80 0.5 2003.0 0.1X
|
|
infer timestamps from files 39621 39674 61 0.3 3962.1 0.0X
|
|
read date text from files 1039 1068 40 9.6 103.9 1.2X
|
|
read date from files 9352 9363 10 1.1 935.2 0.1X
|
|
infer date from files 11465 11485 23 0.9 1146.5 0.1X
|
|
timestamp strings 1759 1812 59 5.7 175.9 0.7X
|
|
parse timestamps from Dataset[String] 20806 20858 75 0.5 2080.6 0.1X
|
|
infer timestamps from Dataset[String] 40537 40821 258 0.2 4053.7 0.0X
|
|
date strings 1808 1816 12 5.5 180.8 0.7X
|
|
parse dates from Dataset[String] 12080 12311 245 0.8 1208.0 0.1X
|
|
from_csv(timestamp) 20120 21503 1224 0.5 2012.0 0.1X
|
|
from_csv(date) 10607 10768 246 0.9 1060.7 0.1X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Filters pushdown: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
w/o filters 13109 13249 151 0.0 131086.4 1.0X
|
|
pushdown disabled 12951 12994 63 0.0 129509.7 1.0X
|
|
w/ filters 1095 1113 15 0.1 10953.7 12.0X
|
|
|
|
|