spark-instrumented-optimizer

History

yi.wu 82ce4753aa [SPARK-26580][SQL][ML][FOLLOW-UP] Throw exception when use untyped UDF by default ### What changes were proposed in this pull request? This PR proposes to throw exception by default when user use untyped UDF(a.k.a `org.apache.spark.sql.functions.udf(AnyRef, DataType)`). And user could still use it by setting `spark.sql.legacy.useUnTypedUdf.enabled` to `true`. ### Why are the changes needed? According to #23498, since Spark 3.0, the untyped UDF will return the default value of the Java type if the input value is null. For example, `val f = udf((x: Int) => x, IntegerType)`, `f($"x")` will return 0 in Spark 3.0 but null in Spark 2.4. And the behavior change is introduced due to Spark3.0 is built with Scala 2.12 by default. As a result, this might change data silently and may cause correctness issue if user still expect `null` in some cases. Thus, we'd better to encourage user to use typed UDF to avoid this problem. ### Does this PR introduce any user-facing change? Yeah. User will hit exception now when use untyped UDF. ### How was this patch tested? Added test and updated some tests. Closes #27488 from Ngone51/spark_26580_followup. Lead-authored-by: yi.wu <yi.wu@databricks.com> Co-authored-by: wuyi <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2020-02-21 14:46:54 +08:00
..
benchmarks	[SPARK-30413][SQL] Avoid WrappedArray roundtrip in GenericArrayData constructor, plus related optimization in ParquetMapConverter	2020-01-19 19:12:19 -08:00
src	[SPARK-26580][SQL][ML][FOLLOW-UP] Throw exception when use untyped UDF by default	2020-02-21 14:46:54 +08:00
pom.xml	[INFRA] Reverts commit `56dcd79` and `c216ef1`	2019-12-16 19:57:44 -07:00