[SPARK-33267][SQL] Fix NPE issue on 'In' filter when one of values contains null

### What changes were proposed in this pull request?

This PR proposes to fix the NPE issue on `In` filter when one of values contain null. In real case, you can trigger this issue when you try to push down the filter with `in (..., null)` against V2 source table. `DataSourceStrategy` caches the mapping (filter instance -> expression) in HashMap, which leverages hash code on the key, hence it could trigger the NPE issue.

### Why are the changes needed?

This is an obvious bug as `In` filter doesn't care about null value when calculating hash code.

### Does this PR introduce _any_ user-facing change?

Yes, previously the query with having `null` in "in" condition against data source V2 source table supporting push down filter failed with NPE, whereas after the PR the query will not fail.

### How was this patch tested?

UT added. The new UT fails without the PR and passes with the PR.

Closes #30170 from HeartSaVioR/SPARK-33267.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This commit is contained in:
Jungtaek Lim (HeartSaVioR) 2020-10-28 10:00:29 -07:00 committed by Dongjoon Hyun
parent a6216e2446
commit a744fea3be
2 changed files with 11 additions and 1 deletions

View file

@ -164,7 +164,7 @@ case class In(attribute: String, values: Array[Any]) extends Filter {
var h = attribute.hashCode
values.foreach { v =>
h *= 41
h += v.hashCode()
h += (if (v != null) v.hashCode() else 0)
}
h
}

View file

@ -413,6 +413,16 @@ class DataSourceV2Suite extends QueryTest with SharedSparkSession with AdaptiveS
}
}
}
test("SPARK-33267: push down with condition 'in (..., null)' should not throw NPE") {
Seq(classOf[AdvancedDataSourceV2], classOf[JavaAdvancedDataSourceV2]).foreach { cls =>
withClue(cls.getName) {
val df = spark.read.format(cls.getName).load()
// before SPARK-33267 below query just threw NPE
df.select('i).where("i in (1, null)").collect()
}
}
}
}