spark-instrumented-optimizer/python/pyspark/sql
Liang-Chi Hsieh 146001a9ff [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs
## What changes were proposed in this pull request?

There are two related bugs of Python-only UDTs. Because the test case of second one needs the first fix too. I put them into one PR. If it is not appropriate, please let me know.

### First bug: When MapObjects works on Python-only UDTs

`RowEncoder` will use `PythonUserDefinedType.sqlType` for its deserializer expression. If the sql type is `ArrayType`, we will have `MapObjects` working on it. But `MapObjects` doesn't consider `PythonUserDefinedType` as its input data type. It causes error like:

    import pyspark.sql.group
    from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
    from pyspark.sql.types import *

    schema = StructType().add("key", LongType()).add("val", PythonOnlyUDT())
    df = spark.createDataFrame([(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)], schema=schema)
    df.show()

    File "/home/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o36.showString.
    : java.lang.RuntimeException: Error while decoding: scala.MatchError: org.apache.spark.sql.types.PythonUserDefinedTypef4ceede8 (of class org.apache.spark.sql.types.PythonUserDefinedType)
    ...

### Second bug: When Python-only UDTs is the element type of ArrayType

    import pyspark.sql.group
    from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
    from pyspark.sql.types import *

    schema = StructType().add("key", LongType()).add("val", ArrayType(PythonOnlyUDT()))
    df = spark.createDataFrame([(i % 3, [PythonOnlyPoint(float(i), float(i))]) for i in range(10)], schema=schema)
    df.show()

## How was this patch tested?
PySpark's sql tests.

Author: Liang-Chi Hsieh <simonh@tw.ibm.com>

Closes #13778 from viirya/fix-pyudt.
2016-08-02 10:08:18 -07:00
..
__init__.py [SPARK-14945][PYTHON] SparkSession Python API 2016-04-28 10:55:48 -07:00
catalog.py [SPARK-16772] Correct API doc references to PySpark classes + formatting fixes 2016-07-28 14:57:15 -07:00
column.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
conf.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
context.py [SPARK-16772][PYTHON][DOCS] Restore "datatype string" to Python API docstrings 2016-07-29 14:07:03 -07:00
dataframe.py [SPARK-16772] Correct API doc references to PySpark classes + formatting fixes 2016-07-28 14:57:15 -07:00
functions.py [SPARK-16772] Correct API doc references to PySpark classes + formatting fixes 2016-07-28 14:57:15 -07:00
group.py [MINOR][PYSPARK][DOC] Fix wrongly formatted examples in PySpark documentation 2016-07-06 10:45:51 -07:00
readwriter.py [SPARK-16772] Correct API doc references to PySpark classes + formatting fixes 2016-07-28 14:57:15 -07:00
session.py [SPARK-16772][PYTHON][DOCS] Restore "datatype string" to Python API docstrings 2016-07-29 14:07:03 -07:00
streaming.py [SPARK-16772] Correct API doc references to PySpark classes + formatting fixes 2016-07-28 14:57:15 -07:00
tests.py [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs 2016-08-02 10:08:18 -07:00
types.py [SPARK-16772] Correct API doc references to PySpark classes + formatting fixes 2016-07-28 14:57:15 -07:00
utils.py [SPARK-15953][WIP][STREAMING] Renamed ContinuousQuery to StreamingQuery 2016-06-15 10:46:07 -07:00
window.py [SPARK-14058][PYTHON] Incorrect docstring in Window.order 2016-03-21 23:52:33 -07:00