6cb23c163c
### What changes were proposed in this pull request?
fix SimplifyConditionalsInPredicate to be null-safe
Reproducible:
```
import org.apache.spark.sql.types.{StructField, BooleanType, StructType}
import org.apache.spark.sql.Row
val schema = List(
StructField("b", BooleanType, true)
)
val data = Seq(
Row(true),
Row(false),
Row(null)
)
val df = spark.createDataFrame(
spark.sparkContext.parallelize(data),
StructType(schema)
)
// cartesian product of true / false / null
val df2 = df.select(col("b") as "cond").crossJoin(df.select(col("b") as "falseVal"))
df2.createOrReplaceTempView("df2")
spark.sql("SELECT * FROM df2 WHERE IF(cond, FALSE, falseVal)").show()
// actual:
// +-----+--------+
// | cond|falseVal|
// +-----+--------+
// |false| true|
// +-----+--------+
spark.sql("SET spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.SimplifyConditionalsInPredicate")
spark.sql("SELECT * FROM df2 WHERE IF(cond, FALSE, falseVal)").show()
// expected:
// +-----+--------+
// | cond|falseVal|
// +-----+--------+
// |false| true|
// | null| true|
// +-----+--------+
```
### Why are the changes needed?
is a regression that leads to incorrect results
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
existing tests
Closes #33928 from hypercubestart/fix-SimplifyConditionalsInPredicate.
Authored-by: Andrew Liu <andrewlliu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit
|
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |