815c7929c2
### What changes were proposed in this pull request? This PR proposes two things: 1. Convert `null` to `string` type during schema inference of `schema_of_json` as JSON datasource does. This is a bug fix as well because `null` string is not the proper DDL formatted string and it is unable for SQL parser to recognise it as a type string. We should match it to JSON datasource and return a string type so `schema_of_json` returns a proper DDL formatted string. 2. Let `schema_of_json` respect `dropFieldIfAllNull` option during schema inference. ### Why are the changes needed? To let `schema_of_json` return a proper DDL formatted string, and respect `dropFieldIfAllNull` option. ### Does this PR introduce any user-facing change? Yes, it does. ```scala import collection.JavaConverters._ import org.apache.spark.sql.functions._ spark.range(1).select(schema_of_json(lit("""{"id": ""}"""))).show() spark.range(1).select(schema_of_json(lit("""{"id": "a", "drop": {"drop": null}}"""), Map("dropFieldIfAllNull" -> "true").asJava)).show(false) ``` **Before:** ``` struct<id:null> struct<drop:struct<drop:null>,id:string> ``` **After:** ``` struct<id:string> struct<id:string> ``` ### How was this patch tested? Manually tested, and unittests were added. Closes #27854 from HyukjinKwon/SPARK-31065. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |