09bebc8bde
### What changes were proposed in this pull request?
Rework [PR](https://github.com/apache/spark/pull/33212) with suggestions.
This PR make `spark.read.json()` has the same behavior with Datasource API `spark.read.format("json").load("path")`. Spark should turn a non-nullable schema into nullable when using API `spark.read.json()` by default.
Here is an example:
```scala
val schema = StructType(Seq(StructField("value",
StructType(Seq(
StructField("x", IntegerType, nullable = false),
StructField("y", IntegerType, nullable = false)
)),
nullable = true
)))
val testDS = Seq("""{"value":{"x":1}}""").toDS
spark.read
.schema(schema)
.json(testDS)
.printSchema()
spark.read
.schema(schema)
.format("json")
.load("/tmp/json/t1")
.printSchema()
// root
// |-- value: struct (nullable = true)
// | |-- x: integer (nullable = true)
// | |-- y: integer (nullable = true)
```
Before this pr:
```
// output of spark.read.json()
root
|-- value: struct (nullable = true)
| |-- x: integer (nullable = false)
| |-- y: integer (nullable = false)
```
After this pr:
```
// output of spark.read.json()
root
|-- value: struct (nullable = true)
| |-- x: integer (nullable = true)
| |-- y: integer (nullable = true)
```
- `spark.read.csv()` also has the same problem.
- Datasource API `spark.read.format("json").load("path")` do this logical when resolve relation.
|
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |