9643eab53e
### What changes were proposed in this pull request? This patch proposes to support subexpression elimination for interpreted predicate. ### Why are the changes needed? Similar to interpreted projection, there are use cases when codegen predicate is not able to work, e.g. too complex schema, non-codegen expression, etc. When there are frequently occurring expressions (subexpressions) among predicate expression, the performance is quite bad as we need to re-compute same expressions. We should be able to support subexpression elimination for interpreted predicate like interpreted projection. ### Does this PR introduce _any_ user-facing change? No, this doesn't change user behavior. ### How was this patch tested? Unit test and benchmark. Closes #30497 from viirya/SPARK-33540. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
26 lines
1.9 KiB
Plaintext
26 lines
1.9 KiB
Plaintext
================================================================================================
|
|
Benchmark for performance of subexpression elimination
|
|
================================================================================================
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.6
|
|
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
|
|
from_json as subExpr in Project: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
subExprElimination false, codegen: true 23094 23763 585 0.0 230939301.2 1.0X
|
|
subExprElimination false, codegen: false 23161 24087 844 0.0 231611379.8 1.0X
|
|
subExprElimination true, codegen: true 1492 1517 30 0.0 14921022.9 15.5X
|
|
subExprElimination true, codegen: false 1300 1361 93 0.0 12996167.7 17.8X
|
|
|
|
Preparing data for benchmarking ...
|
|
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.6
|
|
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
|
|
from_json as subExpr in Filter: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
subexpressionElimination off, codegen on 37069 37767 985 0.0 370694301.5 1.0X
|
|
subexpressionElimination off, codegen on 37095 37970 1008 0.0 370945081.6 1.0X
|
|
subexpressionElimination off, codegen on 20618 21443 715 0.0 206175173.8 1.8X
|
|
subexpressionElimination off, codegen on 21563 21887 307 0.0 215626274.7 1.7X
|
|
|
|
|