004aea8155
### What changes were proposed in this pull request? This PR proposes to use `ExpressionEncoder` for the return type of ScalaUDF to convert to the catalyst type, instead of using `CatalystTypeConverters`. Note, this change only takes effect for typed Scala UDF since its the only case where we know the type tag of the raw type. ### Why are the changes needed? Users now could register a UDF with `Instant`/`LocalDate` as return types even with `spark.sql.datetime.java8API.enabled=false`. However, the UDF can not really be used. For example, if we try: ```scala scala> sql("set spark.sql.datetime.java8API.enabled=false") scala> spark.udf.register("buildDate", udf{ d: String => java.time.LocalDate.parse(d) }) scala> Seq("2020-07-02").toDF("d").selectExpr("CAST(buildDate(d) AS STRING)").show ``` Then, we will hit the error: ```scala java.lang.ClassCastException: java.time.LocalDate cannot be cast to java.sql.Date at org.apache.spark.sql.catalyst.CatalystTypeConverters$DateConverter$.toCatalystImpl(CatalystTypeConverters.scala:304) at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:107) at org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:425) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1169) ... ``` as it actually requires enabling `spark.sql.datetime.java8API.enabled` when using the UDF. And I think this could make users get confused. This happens because when registering the UDF, Spark actually uses `ExpressionEncoder` to ser/deser types. However, when using UDF, Spark uses `CatalystTypeConverters`, which is under control of `spark.sql.datetime.java8API.enabled`, to ser/deser types. Therefore, Spark would fail to convert the Java8 date time types. If we could also use `ExpressionEncoder` to ser/deser types for the return type, similar to what we do for the input parameter types, then, UDF could support Instant/LocalDate, event other combined complex types as well. ### Does this PR introduce _any_ user-facing change? Yes. Before this PR, if users run the demo above, they would hit the error. After this PR, the demo will run successfully. ### How was this patch tested? Updated 2 tests and added a new one for combined types of `Instant` and `LocalDate`. Closes #28979 from Ngone51/udf-return-encoder. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
v1.2/src | ||
v2.3/src | ||
pom.xml |