2537fe8cba
### What changes were proposed in this pull request? Currently, inferring nested structs is always using `MapType`. This behavior causes an issue because it infers the schema with a value type of the first field of the struct as below: ```python data = [{"inside_struct": {"payment": 100.5, "name": "Lee"}}] df = spark.createDataFrame(data) df.show(truncate=False) +--------------------------------+ |inside_struct | +--------------------------------+ |{name -> null, payment -> 100.5}| +--------------------------------+ ``` The "name" became `null`, but it should've been `"Lee"`. In this case, we need to be able to infer the schema with a `StructType` instead of a `MapType`. Therefore, this PR proposes adding an new configuration `spark.sql.pyspark.inferNestedDictAsStruct.enabled` to handle which type is used for inferring nested structs. - When `spark.sql.pyspark.inferNestedDictAsStruct.enabled` is `false` (by default), inferring nested structs by `MapType` - When `spark.sql.pyspark.inferNestedDictAsStruct.enabled` is `true`, inferring nested structs by `StructType` ### Why are the changes needed? Because always inferring the nested structs by `MapType` doesn't work properly for some cases. ### Does this PR introduce _any_ user-facing change? New configuration `spark.sql.pyspark.inferNestedDictAsStruct.enabled` is added. ### How was this patch tested? Added an unit test Closes #33214 from itholic/SPARK-35929. Lead-authored-by: itholic <haejoon.lee@databricks.com> Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
avro | ||
pandas | ||
tests | ||
__init__.py | ||
__init__.pyi | ||
_typing.pyi | ||
catalog.py | ||
catalog.pyi | ||
column.py | ||
column.pyi | ||
conf.py | ||
conf.pyi | ||
context.py | ||
context.pyi | ||
dataframe.py | ||
dataframe.pyi | ||
functions.py | ||
functions.pyi | ||
group.py | ||
group.pyi | ||
readwriter.py | ||
readwriter.pyi | ||
session.py | ||
session.pyi | ||
streaming.py | ||
streaming.pyi | ||
types.py | ||
types.pyi | ||
udf.py | ||
udf.pyi | ||
utils.py | ||
window.py | ||
window.pyi |