3a437ed22b
### What changes were proposed in this pull request? Convert `NULL` elements of maps, structs and arrays to the `"null"` string while converting maps/struct/array values to strings. The SQL config `spark.sql.legacy.omitNestedNullInCast.enabled` controls the behaviour. When it is `true`, `NULL` elements of structs/maps/arrays will be omitted otherwise, when it is `false`, `NULL` elements will be converted to `"null"`. ### Why are the changes needed? 1. It is impossible to distinguish empty string and null, for instance: ```scala scala> Seq(Seq(""), Seq(null)).toDF().show +-----+ |value| +-----+ | []| | []| +-----+ ``` 2. Inconsistent NULL conversions for top-level values and nested columns, for instance: ```scala scala> sql("select named_struct('c', null), null").show +---------------------+----+ |named_struct(c, NULL)|NULL| +---------------------+----+ | []|null| +---------------------+----+ ``` 3. `.show()` is different from conversions to Hive strings, and as a consequence its output is different from `spark-sql` (sql tests): ```sql spark-sql> select named_struct('c', null) as struct; {"c":null} ``` ```scala scala> sql("select named_struct('c', null) as struct").show +------+ |struct| +------+ | []| +------+ ``` 4. It is impossible to distinguish empty struct/array from struct/array with null in the current implementation: ```scala scala> Seq[Seq[String]](Seq(), Seq(null)).toDF.show() +-----+ |value| +-----+ | []| | []| +-----+ ``` ### Does this PR introduce _any_ user-facing change? Yes, before: ```scala scala> Seq(Seq(""), Seq(null)).toDF().show +-----+ |value| +-----+ | []| | []| +-----+ ``` After: ```scala scala> Seq(Seq(""), Seq(null)).toDF().show +------+ | value| +------+ | []| |[null]| +------+ ``` ### How was this patch tested? By existing test suite `CastSuite`. Closes #29311 from MaxGekk/nested-null-to-string. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |