spark-instrumented-optimizer/python/pyspark
Nicholas Chammas 41cf1d093f [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict
### What changes were proposed in this pull request?

As discussed in https://github.com/apache/spark/pull/29491#discussion_r474451282 and in SPARK-32686, this PR un-deprecates Spark's ability to infer a DataFrame schema from a list of dictionaries. The ability is Pythonic and matches functionality offered by Pandas.

### Why are the changes needed?

This change clarifies to users that this behavior is supported and is not going away in the near future.

### Does this PR introduce _any_ user-facing change?

Yes. There used to be a `UserWarning` for this, but now there isn't.

### How was this patch tested?

I tested this manually.

Before:

```python
>>> spark.createDataFrame(spark.sparkContext.parallelize([{'a': 5}]))
/Users/nchamm/Documents/GitHub/nchammas/spark/python/pyspark/sql/session.py:388: UserWarning: Using RDD of dict to inferSchema is deprecated. Use pyspark.sql.Row instead
  warnings.warn("Using RDD of dict to inferSchema is deprecated. "
DataFrame[a: bigint]

>>> spark.createDataFrame([{'a': 5}])
.../python/pyspark/sql/session.py:378: UserWarning: inferring schema from dict is deprecated,please use pyspark.sql.Row instead
  warnings.warn("inferring schema from dict is deprecated,"
DataFrame[a: bigint]
```

After:

```python
>>> spark.createDataFrame(spark.sparkContext.parallelize([{'a': 5}]))
DataFrame[a: bigint]

>>> spark.createDataFrame([{'a': 5}])
DataFrame[a: bigint]
```

Closes #29510 from nchammas/SPARK-32686-df-dict-infer-schema.

Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
2020-08-24 14:55:11 -07:00
..
cloudpickle [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0 2020-07-17 11:49:18 +09:00
ml [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() 2020-08-22 09:27:31 -05:00
mllib [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
resource [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
sql [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict 2020-08-24 14:55:11 -07:00
streaming [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
testing [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
tests [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
__init__.py [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
accumulators.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
broadcast.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
conf.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
context.py [SPARK-32160][CORE][PYSPARK][FOLLOWUP] Change the config name to switch allow/disallow SparkContext in executors 2020-08-04 12:45:06 +09:00
daemon.py [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon 2019-07-31 09:10:24 +09:00
files.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
find_spark_home.py [SPARK-29802][BUILD] Use python3 in build scripts 2020-07-19 11:02:37 +09:00
java_gateway.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
rdd.py [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
serializers.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
shell.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
shuffle.py [SPARK-32435][PYTHON] Remove heapq3 port from Python 3 2020-07-27 20:10:13 +09:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3 2020-08-10 07:33:06 -07:00
taskcontext.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode 2020-07-30 10:15:25 +09:00
version.py [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00
worker.py [MINOR][PYTHON] Fix spacing in error message 2020-07-28 11:22:18 +09:00