spark-instrumented-optimizer

History

Angerszhuuuu a472612eb8 [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan ### What changes were proposed in this pull request? For query ``` select array_union(array(cast('nan' as double), cast('nan' as double)), array()) ``` This returns [NaN, NaN], but it should return [NaN]. This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too. In this pr we add a wrap for OpenHashSet that can handle `null`, `Double.NaN`, `Float.NaN` together ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? ArrayUnion won't show duplicated `NaN` value ### How was this patch tested? Added UT Closes #33955 from AngersZhuuuu/SPARK-36702-WrapOpenHashSet. Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com> Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit `f71f37755d`) Signed-off-by: Wenchen Fan <wenchen@databricks.com>		2021-09-14 18:26:02 +08:00
..
benchmarks	[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines	2021-04-03 23:02:56 +03:00
src	[SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan	2021-09-14 18:26:02 +08:00
pom.xml	[SPARK-36712][BUILD] Make scala-parallel-collections in 2.13 POM a direct dependency (not in maven profile)	2021-09-13 11:06:58 -05:00