spark-instrumented-optimizer

History

HyukjinKwon 815c7929c2 [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source ### What changes were proposed in this pull request? This PR proposes two things: 1. Convert `null` to `string` type during schema inference of `schema_of_json` as JSON datasource does. This is a bug fix as well because `null` string is not the proper DDL formatted string and it is unable for SQL parser to recognise it as a type string. We should match it to JSON datasource and return a string type so `schema_of_json` returns a proper DDL formatted string. 2. Let `schema_of_json` respect `dropFieldIfAllNull` option during schema inference. ### Why are the changes needed? To let `schema_of_json` return a proper DDL formatted string, and respect `dropFieldIfAllNull` option. ### Does this PR introduce any user-facing change? Yes, it does. ```scala import collection.JavaConverters._ import org.apache.spark.sql.functions._ spark.range(1).select(schema_of_json(lit("""{"id": ""}"""))).show() spark.range(1).select(schema_of_json(lit("""{"id": "a", "drop": {"drop": null}}"""), Map("dropFieldIfAllNull" -> "true").asJava)).show(false) ``` Before: ``` struct<id:null> struct<drop:struct<drop:null>,id:string> ``` After: ``` struct<id:string> struct<id:string> ``` ### How was this patch tested? Manually tested, and unittests were added. Closes #27854 from HyukjinKwon/SPARK-31065. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>		2020-03-10 00:33:32 -07:00
..
benchmarks	[SPARK-30413][SQL] Avoid WrappedArray roundtrip in GenericArrayData constructor, plus related optimization in ParquetMapConverter	2020-01-19 19:12:19 -08:00
src	[SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source	2020-03-10 00:33:32 -07:00
pom.xml	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT	2020-02-25 19:44:31 -08:00