78221bda95
### What changes were proposed in this pull request? In `ApplyFunctionExpression`, move `zipWithIndex` out of the loop for each input row. ### Why are the changes needed? When the `ScalarFunction` is trivial, `zipWithIndex` could incur significant costs, as shown below: <img width="899" alt="Screen Shot 2021-05-11 at 10 03 42 AM" src="https://user-images.githubusercontent.com/506679/117866421-fb19de80-b24b-11eb-8c94-d5e8c8b1eda9.png"> By removing it out of the loop, I'm seeing sometimes 2x speedup from `V2FunctionBenchmark`. For instance: Before: ``` scalar function (long + long) -> long, result_nullable = false codegen = false: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative native_long_add 32437 32896 434 15.4 64.9 1.0X java_long_add_default 85675 97045 NaN 5.8 171.3 0.4X ``` After: ``` scalar function (long + long) -> long, result_nullable = false codegen = false: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative native_long_add 30182 30387 279 16.6 60.4 1.0X java_long_add_default 42862 43009 209 11.7 85.7 0.7X ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes #32507 from sunchao/SPARK-35361. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
45 lines
5.4 KiB
Plaintext
45 lines
5.4 KiB
Plaintext
OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.4.0-1046-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
scalar function (long + long) -> long, result_nullable = true codegen = true: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
native_long_add 9723 11619 1643 51.4 19.4 1.0X
|
|
java_long_add_default 38003 38591 513 13.2 76.0 0.3X
|
|
java_long_add_magic 12398 13007 792 40.3 24.8 0.8X
|
|
java_long_add_static_magic 11551 11711 138 43.3 23.1 0.8X
|
|
scala_long_add_default 39482 39762 275 12.7 79.0 0.2X
|
|
scala_long_add_magic 12794 12830 33 39.1 25.6 0.8X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.4.0-1046-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
scalar function (long + long) -> long, result_nullable = false codegen = true: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
native_long_add 9984 10285 303 50.1 20.0 1.0X
|
|
java_long_add_default 36510 36989 570 13.7 73.0 0.3X
|
|
java_long_add_magic 13391 13764 332 37.3 26.8 0.7X
|
|
java_long_add_static_magic 10033 10462 388 49.8 20.1 1.0X
|
|
scala_long_add_default 35104 35480 375 14.2 70.2 0.3X
|
|
scala_long_add_magic 13587 13899 366 36.8 27.2 0.7X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.4.0-1046-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
scalar function (long + long) -> long, result_nullable = true codegen = false: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
native_long_add 32473 32622 247 15.4 64.9 1.0X
|
|
java_long_add_default 44108 44120 11 11.3 88.2 0.7X
|
|
java_long_add_magic 166139 167629 1828 3.0 332.3 0.2X
|
|
java_long_add_static_magic 181452 183355 1668 2.8 362.9 0.2X
|
|
scala_long_add_default 42405 42652 330 11.8 84.8 0.8X
|
|
scala_long_add_magic 196868 198003 1033 2.5 393.7 0.2X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.4.0-1046-azure
|
|
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
|
|
scalar function (long + long) -> long, result_nullable = false codegen = false: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
--------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
native_long_add 30182 30387 279 16.6 60.4 1.0X
|
|
java_long_add_default 42862 43009 209 11.7 85.7 0.7X
|
|
java_long_add_magic 218295 219387 1078 2.3 436.6 0.1X
|
|
java_long_add_static_magic 211812 213150 1898 2.4 423.6 0.1X
|
|
scala_long_add_default 42401 42642 234 11.8 84.8 0.7X
|
|
scala_long_add_magic 214497 214760 307 2.3 429.0 0.1X
|
|
|