spark-instrumented-optimizer/python/pyspark/sql
Max Gekk 9d5e48ea95 [SPARK-33270][SQL] Return SQL schema instead of Catalog string from the SchemaOfJson expression
### What changes were proposed in this pull request?
Return schema in SQL format instead of Catalog string from the `SchemaOfJson` expression.

### Why are the changes needed?
In some cases, `from_json()` cannot parse schemas returned by `schema_of_json`, for instance, when JSON fields have spaces (gaps). Such fields will be quoted after the changes, and can be parsed by `from_json()`.

Here is the example:
```scala
val in = Seq("""{"a b": 1}""").toDS()
in.select(from_json('value, schema_of_json("""{"a b": 100}""")) as "parsed")
```
raises the exception:
```
== SQL ==
struct<a b:bigint>
------^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseTableSchema(ParseDriver.scala:76)
	at org.apache.spark.sql.types.DataType$.fromDDL(DataType.scala:131)
	at org.apache.spark.sql.catalyst.expressions.ExprUtils$.evalTypeExpr(ExprUtils.scala:33)
	at org.apache.spark.sql.catalyst.expressions.JsonToStructs.<init>(jsonExpressions.scala:537)
	at org.apache.spark.sql.functions$.from_json(functions.scala:4141)
```

### Does this PR introduce _any_ user-facing change?
Yes. For example, `schema_of_json` for the input `{"col":0}`.

Before: `struct<col:bigint>`
After: `STRUCT<`col`: BIGINT>`

### How was this patch tested?
By existing test suites `JsonFunctionsSuite` and `JsonExpressionsSuite`.

Closes #30172 from MaxGekk/schema_of_json-sql-schema.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-10-29 10:30:41 +09:00
..
avro [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
pandas [SPARK-31964][PYTHON][FOLLOW-UP] Use is_categorical_dtype instead of deprecated is_categorical 2020-10-21 14:46:47 -07:00
tests [SPARK-33268][SQL][PYTHON] Fix bugs for casting data from/to PythonUserDefinedType 2020-10-28 08:33:02 -07:00
__init__.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
__init__.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
_typing.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
catalog.py [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() 2020-08-25 13:42:31 +09:00
catalog.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
column.py [SPARK-32511][FOLLOW-UP][SQL][R][PYTHON] Add dropFields to SparkR and PySpark 2020-10-08 10:37:42 +09:00
column.pyi [SPARK-32511][FOLLOW-UP][SQL][R][PYTHON] Add dropFields to SparkR and PySpark 2020-10-08 10:37:42 +09:00
conf.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
conf.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
context.py [SPARK-32897][PYTHON] Don't show a deprecation warning at SparkSession.builder.getOrCreate 2020-09-16 10:13:47 -07:00
context.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
dataframe.py [MINOR][PYTHON] Fix the typo in the docstring of method agg() 2020-10-15 17:24:22 -07:00
dataframe.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
functions.py [SPARK-33270][SQL] Return SQL schema instead of Catalog string from the SchemaOfJson expression 2020-10-29 10:30:41 +09:00
functions.pyi [SPARK-32084][PYTHON][SQL] Expand dictionary functions 2020-10-27 11:05:53 +09:00
group.py [SPARK-32719][PYTHON] Add Flake8 check missing imports 2020-08-31 11:23:31 +09:00
group.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
readwriter.py [SPARK-32888][DOCS] Add user document about header flag and RDD as path for reading CSV 2020-09-16 20:16:15 +09:00
readwriter.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
session.py [SPARK-33139][SQL][FOLLOW-UP] Avoid using reflect call on session.py 2020-10-19 16:40:48 +09:00
session.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
streaming.py [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods 2020-09-23 09:28:33 +09:00
streaming.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
types.py [SPARK-32814][PYTHON] Replace __metaclass__ field with metaclass keyword 2020-09-16 20:22:11 +09:00
types.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
udf.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
udf.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
utils.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
window.py [SPARK-30188][SQL] Resolve the failed unit tests when enable AQE 2020-01-13 22:55:19 +08:00
window.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00