spark-instrumented-optimizer/sql/core/benchmarks
Chao Sun 78221bda95 [SPARK-35361][SQL] Improve performance for ApplyFunctionExpression
### What changes were proposed in this pull request?

In `ApplyFunctionExpression`, move `zipWithIndex` out of the loop for each input row.

### Why are the changes needed?

When the `ScalarFunction` is trivial, `zipWithIndex` could incur significant costs, as shown below:

<img width="899" alt="Screen Shot 2021-05-11 at 10 03 42 AM" src="https://user-images.githubusercontent.com/506679/117866421-fb19de80-b24b-11eb-8c94-d5e8c8b1eda9.png">

By removing it out of the loop, I'm seeing sometimes 2x speedup from `V2FunctionBenchmark`. For instance:

Before:
```
scalar function (long + long) -> long, result_nullable = false codegen = false:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
native_long_add                                                                         32437          32896         434         15.4          64.9       1.0X
java_long_add_default                                                                   85675          97045         NaN          5.8         171.3       0.4X
```

After:
```
scalar function (long + long) -> long, result_nullable = false codegen = false:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
native_long_add                                                                         30182          30387         279         16.6          60.4       1.0X
java_long_add_default                                                                   42862          43009         209         11.7          85.7       0.7X
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

Closes #32507 from sunchao/SPARK-35361.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-05-12 10:16:35 +09:00
..
AggregateBenchmark-jdk11-results.txt [SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark 2021-04-27 06:53:42 +00:00
AggregateBenchmark-results.txt [SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark 2021-04-27 06:53:42 +00:00
BloomFilterBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
BloomFilterBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
BuiltInDataSourceWriteBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
BuiltInDataSourceWriteBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
CharVarcharBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
ColumnarBatchBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
ColumnarBatchBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
CompressionSchemeBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
CompressionSchemeBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
CSVBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
CSVBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
DatasetBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
DatasetBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
DataSourceReadBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
DataSourceReadBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
DateTimeBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
DateTimeBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
DateTimeRebaseBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
DateTimeRebaseBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
ExternalAppendOnlyUnsafeRowArrayBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
ExtractBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
ExtractBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
FilterPushdownBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
FilterPushdownBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
GenerateExecBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
GenerateExecBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
HashedRelationMetricsBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
HashedRelationMetricsBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
InExpressionBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
InExpressionBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
InsertTableWithDynamicPartitionsBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
InsertTableWithDynamicPartitionsBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
IntervalBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
IntervalBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
JoinBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
JoinBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
JsonBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
JsonBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
MakeDateTimeBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
MakeDateTimeBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
MetricsAggregationBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
MetricsAggregationBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
MiscBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
MiscBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
OrcNestedSchemaPruningBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
OrcNestedSchemaPruningBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
OrcV2NestedSchemaPruningBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
OrcV2NestedSchemaPruningBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
ParquetNestedPredicatePushDownBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
ParquetNestedPredicatePushDownBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
ParquetNestedSchemaPruningBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
ParquetNestedSchemaPruningBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
PrimitiveArrayBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
PrimitiveArrayBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
RangeBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
RangeBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
SortBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
SortBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
SubExprEliminationBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
SubExprEliminationBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
TPCDSQueryBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
TPCDSQueryBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
UDFBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
UDFBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
UnsafeArrayDataBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
UnsafeArrayDataBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
UpdateFieldsBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
V2FunctionBenchmark-jdk11-results.txt [SPARK-35361][SQL] Improve performance for ApplyFunctionExpression 2021-05-12 10:16:35 +09:00
V2FunctionBenchmark-results.txt [SPARK-35361][SQL] Improve performance for ApplyFunctionExpression 2021-05-12 10:16:35 +09:00
WideSchemaBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
WideSchemaBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
WideTableBenchmark-jdk11-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00
WideTableBenchmark-results.txt [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines 2021-04-03 23:02:56 +03:00