spark-instrumented-optimizer

History

Yin Huai 3efd8bb6cf [SPARK-6052][SQL]In JSON schema inference, we should always set containsNull of an ArrayType to true Always set `containsNull = true` when infer the schema of JSON datasets. If we set `containsNull` based on records we scanned, we may miss arrays with null values when we do sampling. Also, because future data can have arrays with null values, if we convert JSON data to parquet, always setting `containsNull = true` is a more robust way to go. JIRA: https://issues.apache.org/jira/browse/SPARK-6052 Author: Yin Huai <yhuai@databricks.com> Closes #4806 from yhuai/jsonArrayContainsNull and squashes the following commits: 05eab9d [Yin Huai] Change containsNull to true.	2015-03-02 23:18:07 +08:00
..
java/org/apache/spark/sql	[SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation	2015-02-17 10:21:17 -08:00
scala/org/apache/spark/sql	[SPARK-6052][SQL]In JSON schema inference, we should always set containsNull of an ArrayType to true	2015-03-02 23:18:07 +08:00

Yin Huai 3efd8bb6cf [SPARK-6052][SQL]In JSON schema inference, we should always set containsNull of an ArrayType to true

Always set `containsNull = true` when infer the schema of JSON datasets. If we set `containsNull` based on records we scanned, we may miss arrays with null values when we do sampling. Also, because future data can have arrays with null values, if we convert JSON data to parquet, always setting `containsNull = true` is a more robust way to go.

JIRA: https://issues.apache.org/jira/browse/SPARK-6052

Author: Yin Huai <yhuai@databricks.com>

Closes #4806 from yhuai/jsonArrayContainsNull and squashes the following commits:

05eab9d [Yin Huai] Change containsNull to true.

2015-03-02 23:18:07 +08:00

java/org/apache/spark/sql

[SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation

2015-02-17 10:21:17 -08:00

scala/org/apache/spark/sql

[SPARK-6052][SQL]In JSON schema inference, we should always set containsNull of an ArrayType to true

2015-03-02 23:18:07 +08:00