spark-instrumented-optimizer/sql/hive
Max Gekk 157b72ac9f [SPARK-33591][SQL] Recognize null in partition spec values
### What changes were proposed in this pull request?
1. Recognize `null` while parsing partition specs, and put `null` instead of `"null"` as partition values.
2. For V1 catalog: replace `null` by `__HIVE_DEFAULT_PARTITION__`.
3. For V2 catalogs: pass `null` AS IS, and let catalog implementations to decide how to handle `null`s as partition values in spec.

### Why are the changes needed?
Currently, `null` in partition specs is recognized as the `"null"` string which could lead to incorrect results, for example:
```sql
spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED BY (p1);
spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
spark-sql> SELECT isnull(p1) FROM tbl5;
false
```
Even we inserted a row to the partition with the `null` value, **the resulted table doesn't contain `null`**.

### Does this PR introduce _any_ user-facing change?
Yes. After the changes, the example above works as expected:
```sql
spark-sql> SELECT isnull(p1) FROM tbl5;
true
```

### How was this patch tested?
1. By running the affected test suites `SQLQuerySuite`, `AlterTablePartitionV2SQLSuite` and `v1/ShowPartitionsSuite`.
2. Compiling by Scala 2.13:
```
$  ./dev/change-scala-version.sh 2.13
$ ./build/sbt -Pscala-2.13 compile
```

Closes #30538 from MaxGekk/partition-spec-value-null.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-01-08 14:14:27 +00:00
..
benchmarks [SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1) 2020-10-05 15:29:56 -07:00
compatibility/src/test/scala/org/apache/spark/sql/hive/execution [SPARK-33428][SQL] Conv UDF use BigInt to avoid Long value overflow 2020-12-14 14:32:08 +00:00
src [SPARK-33591][SQL] Recognize null in partition spec values 2021-01-08 14:14:27 +00:00
pom.xml [SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT 2020-12-04 14:10:42 -08:00