spark-instrumented-optimizer/sql/core
HyukjinKwon 815c7929c2
[SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source
### What changes were proposed in this pull request?

This PR proposes two things:

1. Convert `null` to `string` type during schema inference of `schema_of_json` as JSON datasource does. This is a bug fix as well because `null` string is not the proper DDL formatted string and it is unable for SQL parser to recognise it as a type string. We should match it to JSON datasource and return a string type so `schema_of_json` returns a proper DDL formatted string.

2. Let `schema_of_json` respect `dropFieldIfAllNull` option during schema inference.

### Why are the changes needed?

To let `schema_of_json` return a proper DDL formatted string, and respect `dropFieldIfAllNull` option.

### Does this PR introduce any user-facing change?
Yes, it does.

```scala
import collection.JavaConverters._
import org.apache.spark.sql.functions._

spark.range(1).select(schema_of_json(lit("""{"id": ""}"""))).show()
spark.range(1).select(schema_of_json(lit("""{"id": "a", "drop": {"drop": null}}"""), Map("dropFieldIfAllNull" -> "true").asJava)).show(false)
```

**Before:**

```
struct<id:null>
struct<drop:struct<drop:null>,id:string>
```

**After:**

```
struct<id:string>
struct<id:string>
```

### How was this patch tested?

Manually tested, and unittests were added.

Closes #27854 from HyukjinKwon/SPARK-31065.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-03-10 00:33:32 -07:00
..
benchmarks [SPARK-30843][SQL] Fix getting of time components before 1582 year 2020-02-17 13:59:21 +08:00
src [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source 2020-03-10 00:33:32 -07:00
v1.2/src [SPARK-31058][SQL][TEST-HIVE1.2] Consolidate the implementation of quoteIfNeeded 2020-03-06 00:13:57 +00:00
v2.3/src [SPARK-31058][SQL][TEST-HIVE1.2] Consolidate the implementation of quoteIfNeeded 2020-03-06 00:13:57 +00:00
pom.xml [SPARK-30984][SS] Add UI test for Structured Streaming UI 2020-03-04 13:55:34 +08:00