spark-instrumented-optimizer/sql/hive
ulysses-you 999d3b89b6 [SPARK-34515][SQL] Fix NPE if InSet contains null value during getPartitionsByFilter
### What changes were proposed in this pull request?

Skip null value during rewrite `InSet` to `>= and <=` at getPartitionsByFilter.

### Why are the changes needed?

Spark will convert `InSet` to `>= and <=` if it's values size over `spark.sql.hive.metastorePartitionPruningInSetThreshold` during pruning partition . At this case, if values contain a null, we will get such exception 
 
```
java.lang.NullPointerException
 at org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:1389)
 at org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:50)
 at scala.math.LowPriorityOrderingImplicits$$anon$3.compare(Ordering.scala:153)
 at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
 at java.util.TimSort.sort(TimSort.java:220)
 at java.util.Arrays.sort(Arrays.java:1438)
 at scala.collection.SeqLike.sorted(SeqLike.scala:659)
 at scala.collection.SeqLike.sorted$(SeqLike.scala:647)
 at scala.collection.AbstractSeq.sorted(Seq.scala:45)
 at org.apache.spark.sql.hive.client.Shim_v0_13.convert$1(HiveShim.scala:772)
 at org.apache.spark.sql.hive.client.Shim_v0_13.$anonfun$convertFilters$4(HiveShim.scala:826)
 at scala.collection.immutable.Stream.flatMap(Stream.scala:489)
 at org.apache.spark.sql.hive.client.Shim_v0_13.convertFilters(HiveShim.scala:826)
 at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:848)
 at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:750)
```

### Does this PR introduce _any_ user-facing change?

Yes, bug fix.

### How was this patch tested?

Add test.

Closes #31632 from ulysses-you/SPARK-34515.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-02-24 21:32:19 +08:00
..
benchmarks [SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1) 2020-10-05 15:29:56 -07:00
compatibility/src/test/scala/org/apache/spark/sql/hive/execution [SPARK-33428][SQL] Conv UDF use BigInt to avoid Long value overflow 2020-12-14 14:32:08 +00:00
src [SPARK-34515][SQL] Fix NPE if InSet contains null value during getPartitionsByFilter 2021-02-24 21:32:19 +08:00
pom.xml [SPARK-27733][CORE] Upgrade Avro to version 1.10.1 2021-01-20 15:42:27 -08:00