41cf1d093f
### What changes were proposed in this pull request? As discussed in https://github.com/apache/spark/pull/29491#discussion_r474451282 and in SPARK-32686, this PR un-deprecates Spark's ability to infer a DataFrame schema from a list of dictionaries. The ability is Pythonic and matches functionality offered by Pandas. ### Why are the changes needed? This change clarifies to users that this behavior is supported and is not going away in the near future. ### Does this PR introduce _any_ user-facing change? Yes. There used to be a `UserWarning` for this, but now there isn't. ### How was this patch tested? I tested this manually. Before: ```python >>> spark.createDataFrame(spark.sparkContext.parallelize([{'a': 5}])) /Users/nchamm/Documents/GitHub/nchammas/spark/python/pyspark/sql/session.py:388: UserWarning: Using RDD of dict to inferSchema is deprecated. Use pyspark.sql.Row instead warnings.warn("Using RDD of dict to inferSchema is deprecated. " DataFrame[a: bigint] >>> spark.createDataFrame([{'a': 5}]) .../python/pyspark/sql/session.py:378: UserWarning: inferring schema from dict is deprecated,please use pyspark.sql.Row instead warnings.warn("inferring schema from dict is deprecated," DataFrame[a: bigint] ``` After: ```python >>> spark.createDataFrame(spark.sparkContext.parallelize([{'a': 5}])) DataFrame[a: bigint] >>> spark.createDataFrame([{'a': 5}]) DataFrame[a: bigint] ``` Closes #29510 from nchammas/SPARK-32686-df-dict-infer-schema. Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com> Signed-off-by: Bryan Cutler <cutlerb@gmail.com> |
||
---|---|---|
.. | ||
cloudpickle | ||
ml | ||
mllib | ||
resource | ||
sql | ||
streaming | ||
testing | ||
tests | ||
__init__.py | ||
_globals.py | ||
accumulators.py | ||
broadcast.py | ||
conf.py | ||
context.py | ||
daemon.py | ||
files.py | ||
find_spark_home.py | ||
java_gateway.py | ||
join.py | ||
profiler.py | ||
rdd.py | ||
rddsampler.py | ||
resultiterable.py | ||
serializers.py | ||
shell.py | ||
shuffle.py | ||
statcounter.py | ||
status.py | ||
storagelevel.py | ||
taskcontext.py | ||
traceback_utils.py | ||
util.py | ||
version.py | ||
worker.py |