a6216e2446
### What changes were proposed in this pull request? This PR intends to fix bus for casting data from/to PythonUserDefinedType. A sequence of queries to reproduce this issue is as follows; ``` >>> from pyspark.sql import Row >>> from pyspark.sql.functions import col >>> from pyspark.sql.types import * >>> from pyspark.testing.sqlutils import * >>> >>> row = Row(point=ExamplePoint(1.0, 2.0)) >>> df = spark.createDataFrame([row]) >>> df.select(col("point").cast(PythonOnlyUDT())) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/dataframe.py", line 1402, in select jdf = self._jdf.select(self._jcols(*cols)) File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/utils.py", line 111, in deco return f(*a, **kw) File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o44.select. : java.lang.NullPointerException at org.apache.spark.sql.types.UserDefinedType.acceptsType(UserDefinedType.scala:84) at org.apache.spark.sql.catalyst.expressions.Cast$.canCast(Cast.scala:96) at org.apache.spark.sql.catalyst.expressions.CastBase.checkInputDataTypes(Cast.scala:267) at org.apache.spark.sql.catalyst.expressions.CastBase.resolved$lzycompute(Cast.scala:290) at org.apache.spark.sql.catalyst.expressions.CastBase.resolved(Cast.scala:290) ``` A root cause of this issue is that, since `PythonUserDefinedType#userClassis` always null, `isAssignableFrom` in `UserDefinedType#acceptsType` throws a null exception. To fix it, this PR defines `acceptsType` in `PythonUserDefinedType` and filters out the null case in `UserDefinedType#acceptsType`. ### Why are the changes needed? Bug fixes. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #30169 from maropu/FixPythonUDTCast. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> |
||
---|---|---|
.. | ||
cloudpickle | ||
ml | ||
mllib | ||
resource | ||
sql | ||
streaming | ||
testing | ||
tests | ||
__init__.py | ||
__init__.pyi | ||
_globals.py | ||
_typing.pyi | ||
accumulators.py | ||
accumulators.pyi | ||
broadcast.py | ||
broadcast.pyi | ||
conf.py | ||
conf.pyi | ||
context.py | ||
context.pyi | ||
daemon.py | ||
files.py | ||
files.pyi | ||
find_spark_home.py | ||
install.py | ||
java_gateway.py | ||
join.py | ||
profiler.py | ||
profiler.pyi | ||
py.typed | ||
rdd.py | ||
rdd.pyi | ||
rddsampler.py | ||
resultiterable.py | ||
resultiterable.pyi | ||
serializers.py | ||
shell.py | ||
shuffle.py | ||
statcounter.py | ||
statcounter.pyi | ||
status.py | ||
status.pyi | ||
storagelevel.py | ||
storagelevel.pyi | ||
taskcontext.py | ||
taskcontext.pyi | ||
traceback_utils.py | ||
util.py | ||
version.py | ||
version.pyi | ||
worker.py |