ec70467d4d
### What changes were proposed in this pull request? This PR updates CSVBenchmark especially we have a fix like https://github.com/apache/spark/pull/31858 that could potentially improve the performance. ### Why are the changes needed? To have the updated benchmark results. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually ran the benchmark Closes #31917 from HyukjinKwon/SPARK-34815. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
68 lines
6.1 KiB
Plaintext
68 lines
6.1 KiB
Plaintext
================================================================================================
|
|
Benchmark to measure CSV read/write performance
|
|
================================================================================================
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 11.0.3+12-LTS on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Parsing quoted values: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
One quoted string 21212 21537 327 0.0 424244.5 1.0X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 11.0.3+12-LTS on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 1000 columns 73744 74898 1930 0.0 73743.8 1.0X
|
|
Select 100 columns 22704 22860 236 0.0 22704.4 3.2X
|
|
Select one column 17837 17977 121 0.1 17837.2 4.1X
|
|
count() 4304 4320 27 0.2 4304.0 17.1X
|
|
Select 100 columns, one bad input field 42060 42280 378 0.0 42059.8 1.8X
|
|
Select 100 columns, corrupt record field 46633 47520 773 0.0 46632.5 1.6X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 11.0.3+12-LTS on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Select 10 columns + count() 9906 10132 246 1.0 990.6 1.0X
|
|
Select 1 column + count() 6497 6616 104 1.5 649.7 1.5X
|
|
count() 2285 2322 32 4.4 228.5 4.3X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 11.0.3+12-LTS on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
Create a dataset of timestamps 902 932 30 11.1 90.2 1.0X
|
|
to_csv(timestamp) 8537 8851 274 1.2 853.7 0.1X
|
|
write timestamps to files 7810 8000 238 1.3 781.0 0.1X
|
|
Create a dataset of dates 929 931 2 10.8 92.9 1.0X
|
|
to_csv(date) 5170 5237 62 1.9 517.0 0.2X
|
|
write dates to files 4163 4220 49 2.4 416.3 0.2X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 11.0.3+12-LTS on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
read timestamp text from files 1475 1497 33 6.8 147.5 1.0X
|
|
read timestamps from files 18596 18811 343 0.5 1859.6 0.1X
|
|
infer timestamps from files 37182 37511 342 0.3 3718.2 0.0X
|
|
read date text from files 1183 1210 31 8.5 118.3 1.2X
|
|
read date from files 8797 9099 283 1.1 879.7 0.2X
|
|
infer date from files 11296 11427 218 0.9 1129.6 0.1X
|
|
timestamp strings 1379 1382 4 7.3 137.9 1.1X
|
|
parse timestamps from Dataset[String] 18243 19000 721 0.5 1824.3 0.1X
|
|
infer timestamps from Dataset[String] 38253 39096 731 0.3 3825.3 0.0X
|
|
date strings 1686 1721 35 5.9 168.6 0.9X
|
|
parse dates from Dataset[String] 10474 10680 184 1.0 1047.4 0.1X
|
|
from_csv(timestamp) 18643 18965 350 0.5 1864.3 0.1X
|
|
from_csv(date) 9814 10018 188 1.0 981.4 0.2X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 11.0.3+12-LTS on Mac OS X 10.15.7
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Filters pushdown: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
w/o filters 11243 11535 287 0.0 112433.9 1.0X
|
|
pushdown disabled 11093 11117 34 0.0 110931.9 1.0X
|
|
w/ filters 794 800 5 0.1 7942.1 14.2X
|
|
|
|
|