[SPARK-35361][SQL] Improve performance for ApplyFunctionExpression

### What changes were proposed in this pull request?

In `ApplyFunctionExpression`, move `zipWithIndex` out of the loop for each input row.

### Why are the changes needed?

When the `ScalarFunction` is trivial, `zipWithIndex` could incur significant costs, as shown below:

<img width="899" alt="Screen Shot 2021-05-11 at 10 03 42 AM" src="https://user-images.githubusercontent.com/506679/117866421-fb19de80-b24b-11eb-8c94-d5e8c8b1eda9.png">

By removing it out of the loop, I'm seeing sometimes 2x speedup from `V2FunctionBenchmark`. For instance:

Before:
```
scalar function (long + long) -> long, result_nullable = false codegen = false:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
native_long_add                                                                         32437          32896         434         15.4          64.9       1.0X
java_long_add_default                                                                   85675          97045         NaN          5.8         171.3       0.4X
```

After:
```
scalar function (long + long) -> long, result_nullable = false codegen = false:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
native_long_add                                                                         30182          30387         279         16.6          60.4       1.0X
java_long_add_default                                                                   42862          43009         209         11.7          85.7       0.7X
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

Closes #32507 from sunchao/SPARK-35361.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This commit is contained in:
Chao Sun 2021-05-12 10:16:35 +09:00 committed by Hyukjin Kwon
parent af0d99cce6
commit 78221bda95
3 changed files with 54 additions and 53 deletions

View file

@ -30,10 +30,11 @@ case class ApplyFunctionExpression(
override def dataType: DataType = function.resultType()
private lazy val reusedRow = new GenericInternalRow(children.size)
private lazy val childrenWithIndex = children.zipWithIndex
/** Returns the result of evaluating this expression on a given input Row */
override def eval(input: InternalRow): Any = {
children.zipWithIndex.foreach {
childrenWithIndex.foreach {
case (expr, pos) =>
reusedRow.update(pos, expr.eval(input))
}

View file

@ -2,43 +2,43 @@ OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
scalar function (long + long) -> long, result_nullable = true codegen = true: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------------------------------
native_long_add 18526 18837 403 27.0 37.1 1.0X
java_long_add_default 71274 71488 189 7.0 142.5 0.3X
java_long_add_magic 18467 18712 378 27.1 36.9 1.0X
java_long_add_static_magic 18376 18387 11 27.2 36.8 1.0X
scala_long_add_default 70770 70888 123 7.1 141.5 0.3X
scala_long_add_magic 18492 18545 55 27.0 37.0 1.0X
native_long_add 16079 16684 619 31.1 32.2 1.0X
java_long_add_default 45512 48772 NaN 11.0 91.0 0.4X
java_long_add_magic 19506 19672 262 25.6 39.0 0.8X
java_long_add_static_magic 18770 18901 164 26.6 37.5 0.9X
scala_long_add_default 46895 47662 1136 10.7 93.8 0.3X
scala_long_add_magic 19520 19667 188 25.6 39.0 0.8X
OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
scalar function (long + long) -> long, result_nullable = false codegen = true: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------------------------------
native_long_add 16658 16805 223 30.0 33.3 1.0X
java_long_add_default 69215 69370 145 7.2 138.4 0.2X
java_long_add_magic 18488 18610 139 27.0 37.0 0.9X
java_long_add_static_magic 16505 16534 27 30.3 33.0 1.0X
scala_long_add_default 69036 69121 74 7.2 138.1 0.2X
scala_long_add_magic 18414 18463 44 27.2 36.8 0.9X
native_long_add 17363 17424 67 28.8 34.7 1.0X
java_long_add_default 43884 44592 615 11.4 87.8 0.4X
java_long_add_magic 18927 19100 206 26.4 37.9 0.9X
java_long_add_static_magic 16854 16918 76 29.7 33.7 1.0X
scala_long_add_default 43741 44016 288 11.4 87.5 0.4X
scala_long_add_magic 18770 19022 317 26.6 37.5 0.9X
OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
scalar function (long + long) -> long, result_nullable = true codegen = false: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------------------------------
native_long_add 40877 41045 166 12.2 81.8 1.0X
java_long_add_default 80090 80112 23 6.2 160.2 0.5X
java_long_add_magic 123386 123485 92 4.1 246.8 0.3X
java_long_add_static_magic 120648 120764 184 4.1 241.3 0.3X
scala_long_add_default 80140 80776 1051 6.2 160.3 0.5X
scala_long_add_magic 122739 122909 148 4.1 245.5 0.3X
native_long_add 41825 42237 668 12.0 83.6 1.0X
java_long_add_default 53779 53969 175 9.3 107.6 0.8X
java_long_add_magic 131478 133225 NaN 3.8 263.0 0.3X
java_long_add_static_magic 129304 129754 398 3.9 258.6 0.3X
scala_long_add_default 54602 54986 344 9.2 109.2 0.8X
scala_long_add_magic 132066 132243 159 3.8 264.1 0.3X
OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
scalar function (long + long) -> long, result_nullable = false codegen = false: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------------------------------------------
native_long_add 37374 37746 502 13.4 74.7 1.0X
java_long_add_default 75753 75961 211 6.6 151.5 0.5X
java_long_add_magic 117556 118129 988 4.3 235.1 0.3X
java_long_add_static_magic 115822 116904 1002 4.3 231.6 0.3X
scala_long_add_default 76098 76332 213 6.6 152.2 0.5X
scala_long_add_magic 117451 118082 875 4.3 234.9 0.3X
native_long_add 40817 41093 388 12.2 81.6 1.0X
java_long_add_default 54090 54563 425 9.2 108.2 0.8X
java_long_add_magic 129341 129867 613 3.9 258.7 0.3X
java_long_add_static_magic 127292 127432 218 3.9 254.6 0.3X
scala_long_add_default 53397 53670 328 9.4 106.8 0.8X
scala_long_add_magic 128455 128541 138 3.9 256.9 0.3X

View file

@ -1,44 +1,44 @@
OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.4.0-1046-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
scalar function (long + long) -> long, result_nullable = true codegen = true: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------------------------------
native_long_add 10680 12281 1537 46.8 21.4 1.0X
java_long_add_default 132934 133550 909 3.8 265.9 0.1X
java_long_add_magic 14108 14513 388 35.4 28.2 0.8X
java_long_add_static_magic 11701 11860 163 42.7 23.4 0.9X
scala_long_add_default 131935 132358 531 3.8 263.9 0.1X
scala_long_add_magic 13762 14071 268 36.3 27.5 0.8X
native_long_add 9723 11619 1643 51.4 19.4 1.0X
java_long_add_default 38003 38591 513 13.2 76.0 0.3X
java_long_add_magic 12398 13007 792 40.3 24.8 0.8X
java_long_add_static_magic 11551 11711 138 43.3 23.1 0.8X
scala_long_add_default 39482 39762 275 12.7 79.0 0.2X
scala_long_add_magic 12794 12830 33 39.1 25.6 0.8X
OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.4.0-1046-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
scalar function (long + long) -> long, result_nullable = false codegen = true: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------------------------------
native_long_add 10649 10802 168 47.0 21.3 1.0X
java_long_add_default 130644 131830 1034 3.8 261.3 0.1X
java_long_add_magic 14195 14376 254 35.2 28.4 0.8X
java_long_add_static_magic 10998 11045 42 45.5 22.0 1.0X
scala_long_add_default 133295 136396 NaN 3.8 266.6 0.1X
scala_long_add_magic 14017 14055 34 35.7 28.0 0.8X
native_long_add 9984 10285 303 50.1 20.0 1.0X
java_long_add_default 36510 36989 570 13.7 73.0 0.3X
java_long_add_magic 13391 13764 332 37.3 26.8 0.7X
java_long_add_static_magic 10033 10462 388 49.8 20.1 1.0X
scala_long_add_default 35104 35480 375 14.2 70.2 0.3X
scala_long_add_magic 13587 13899 366 36.8 27.2 0.7X
OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.4.0-1046-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
scalar function (long + long) -> long, result_nullable = true codegen = false: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------------------------------
native_long_add 35847 36138 264 13.9 71.7 1.0X
java_long_add_default 79210 79686 525 6.3 158.4 0.5X
java_long_add_magic 253904 255356 1275 2.0 507.8 0.1X
java_long_add_static_magic 258790 264585 980 1.9 517.6 0.1X
scala_long_add_default 103844 104310 514 4.8 207.7 0.3X
scala_long_add_magic 269234 270824 NaN 1.9 538.5 0.1X
native_long_add 32473 32622 247 15.4 64.9 1.0X
java_long_add_default 44108 44120 11 11.3 88.2 0.7X
java_long_add_magic 166139 167629 1828 3.0 332.3 0.2X
java_long_add_static_magic 181452 183355 1668 2.8 362.9 0.2X
scala_long_add_default 42405 42652 330 11.8 84.8 0.8X
scala_long_add_magic 196868 198003 1033 2.5 393.7 0.2X
OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.4.0-1046-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
scalar function (long + long) -> long, result_nullable = false codegen = false: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------------------------------------------
native_long_add 32437 32896 434 15.4 64.9 1.0X
java_long_add_default 85675 97045 NaN 5.8 171.3 0.4X
java_long_add_magic 273730 276053 2111 1.8 547.5 0.1X
java_long_add_static_magic 277269 278847 1478 1.8 554.5 0.1X
scala_long_add_default 106925 107298 323 4.7 213.9 0.3X
scala_long_add_magic 280643 281611 847 1.8 561.3 0.1X
native_long_add 30182 30387 279 16.6 60.4 1.0X
java_long_add_default 42862 43009 209 11.7 85.7 0.7X
java_long_add_magic 218295 219387 1078 2.3 436.6 0.1X
java_long_add_static_magic 211812 213150 1898 2.4 423.6 0.1X
scala_long_add_default 42401 42642 234 11.8 84.8 0.7X
scala_long_add_magic 214497 214760 307 2.3 429.0 0.1X