68ea290b3a
## What changes were proposed in this pull request? It seems allowed to not set a key and value for a dict to represent the value is `None` or missing as below: ``` python spark.createDataFrame([{"x": 1}, {"y": 2}]).show() ``` ``` +----+----+ | x| y| +----+----+ | 1|null| |null| 2| +----+----+ ``` However, it seems it is not for `Row` as below: ``` python spark.createDataFrame([Row(x=1), Row(y=2)]).show() ``` ``` scala 16/06/19 16:25:56 ERROR Executor: Exception in task 6.0 in stage 66.0 (TID 316) java.lang.IllegalStateException: Input row doesn't have expected number of values required by the schema. 2 fields are required while 1 values are provided. at org.apache.spark.sql.execution.python.EvaluatePython$.fromJava(EvaluatePython.scala:147) at org.apache.spark.sql.SparkSession$$anonfun$7.apply(SparkSession.scala:656) at org.apache.spark.sql.SparkSession$$anonfun$7.apply(SparkSession.scala:656) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:247) at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780) ``` The behaviour seems right but it seems it might confuse users just like this JIRA was reported. This PR adds the explanation for `Row` class. ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes #13771 from HyukjinKwon/SPARK-13748. |
||
---|---|---|
.. | ||
ml | ||
mllib | ||
sql | ||
streaming | ||
__init__.py | ||
accumulators.py | ||
broadcast.py | ||
cloudpickle.py | ||
conf.py | ||
context.py | ||
daemon.py | ||
files.py | ||
find_spark_home.py | ||
heapq3.py | ||
java_gateway.py | ||
join.py | ||
profiler.py | ||
rdd.py | ||
rddsampler.py | ||
resultiterable.py | ||
serializers.py | ||
shell.py | ||
shuffle.py | ||
statcounter.py | ||
status.py | ||
storagelevel.py | ||
taskcontext.py | ||
tests.py | ||
traceback_utils.py | ||
version.py | ||
worker.py |