spark-instrumented-optimizer/python/pyspark/sql/tests
itholic 2537fe8cba [SPARK-35929][PYTHON] Support to infer nested dict as a struct when creating a DataFrame
### What changes were proposed in this pull request?

Currently, inferring nested structs is always using `MapType`.

This behavior causes an issue because it infers the schema with a value type of the first field of the struct as below:

```python
data = [{"inside_struct": {"payment": 100.5, "name": "Lee"}}]
df = spark.createDataFrame(data)
df.show(truncate=False)
+--------------------------------+
|inside_struct                   |
+--------------------------------+
|{name -> null, payment -> 100.5}|
+--------------------------------+
```

The "name" became `null`, but it should've been `"Lee"`.

In this case, we need to be able to infer the schema with a `StructType` instead of a `MapType`.

Therefore, this PR proposes adding an new configuration `spark.sql.pyspark.inferNestedDictAsStruct.enabled` to handle which type is used for inferring nested structs.
- When `spark.sql.pyspark.inferNestedDictAsStruct.enabled` is `false` (by default), inferring nested structs by `MapType`
- When `spark.sql.pyspark.inferNestedDictAsStruct.enabled` is `true`, inferring nested structs by `StructType`

### Why are the changes needed?

Because always inferring the nested structs by `MapType` doesn't work properly for some cases.

### Does this PR introduce _any_ user-facing change?

New configuration `spark.sql.pyspark.inferNestedDictAsStruct.enabled` is added.

### How was this patch tested?

Added an unit test

Closes #33214 from itholic/SPARK-35929.

Lead-authored-by: itholic <haejoon.lee@databricks.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-07-07 15:14:18 +09:00
..
__init__.py [SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files 2018-11-14 14:51:11 +08:00
test_arrow.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
test_catalog.py [SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests 2020-12-01 10:34:40 +09:00
test_column.py [SPARK-34306][SQL][PYTHON][R] Use Snake naming rule across the function APIs 2021-02-02 09:29:40 +09:00
test_conf.py [SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests 2020-12-01 10:34:40 +09:00
test_context.py [SPARK-33021][PYTHON][TESTS] Move functions related test cases into test_functions.py 2020-09-28 21:54:00 -07:00
test_dataframe.py [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing 2021-07-02 16:07:31 +08:00
test_datasources.py [SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests 2020-12-01 10:34:40 +09:00
test_functions.py [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs 2021-05-13 14:58:01 +09:00
test_group.py [SPARK-34306][SQL][PYTHON][R] Use Snake naming rule across the function APIs 2021-02-02 09:29:40 +09:00
test_pandas_cogrouped_map.py [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas 2021-02-02 16:25:32 +09:00
test_pandas_grouped_map.py [SPARK-33489][PYSPARK] Add NullType support for Arrow executions 2021-01-25 11:34:47 +09:00
test_pandas_map.py [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas 2021-02-02 16:25:32 +09:00
test_pandas_udf.py [SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests 2020-12-01 10:34:40 +09:00
test_pandas_udf_grouped_agg.py [SPARK-34610][PYTHON][TEST] Fix Python UDF used in GroupedAggPandasUDFTests 2021-03-04 10:03:54 +09:00
test_pandas_udf_scalar.py [SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests 2020-12-01 10:34:40 +09:00
test_pandas_udf_typehints.py [SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests 2020-12-01 10:34:40 +09:00
test_pandas_udf_window.py [SPARK-34610][PYTHON][TEST] Fix Python UDF used in GroupedAggPandasUDFTests 2021-03-04 10:03:54 +09:00
test_readwriter.py [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
test_serde.py [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
test_session.py [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
test_streaming.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
test_types.py [SPARK-35929][PYTHON] Support to infer nested dict as a struct when creating a DataFrame 2021-07-07 15:14:18 +09:00
test_udf.py [SPARK-34545][SQL] Fix issues with valueCompare feature of pyrolite 2021-03-07 19:12:42 -06:00
test_utils.py [SPARK-34872][SQL] quoteIfNeeded should quote a name which contains non-word characters 2021-03-29 09:31:24 +00:00