1f02871489
### What changes were proposed in this pull request? This patch proposed to skip predicates on PythonUDFs to be pushdown through Aggregate. ### Why are the changes needed? The predicates on PythonUDFs cannot be pushdown through Aggregate. Pushed down predicates cannot be evaluate because PythonUDFs cannot be evaluated on Filter and cause error like: ``` Caused by: java.lang.UnsupportedOperationException: Cannot generate code for expression: mean(input[1, struct<bar:bigint>, true].bar) at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:304) at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:303) at org.apache.spark.sql.catalyst.expressions.PythonUDF.doGenCode(PythonUDF.scala:52) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:146) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:141) at org.apache.spark.sql.catalyst.expressions.CastBase.doGenCode(Cast.scala:821) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:146) at scala.Option.getOrElse(Option.scala:189) ``` ### Does this PR introduce any user-facing change? Yes. Previously the predicates on PythonUDFs will be pushdown through Aggregate can cause error. After this change, the query can work. ### How was this patch tested? Unit test. Closes #28089 from viirya/SPARK-30921. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
avro | ||
pandas | ||
tests | ||
__init__.py | ||
catalog.py | ||
column.py | ||
conf.py | ||
context.py | ||
dataframe.py | ||
functions.py | ||
group.py | ||
readwriter.py | ||
session.py | ||
streaming.py | ||
types.py | ||
udf.py | ||
utils.py | ||
window.py |