4fafdcd63b
### What changes were proposed in this pull request? This PR proposes to improve the error message from Scalar iterator pandas UDF. ### Why are the changes needed? To show the correct error messages. ### Does this PR introduce any user-facing change? Yes, but only in unreleased branches. ```python import pandas as pd from pyspark.sql.functions import pandas_udf, PandasUDFType pandas_udf('long', PandasUDFType.SCALAR_ITER) def pandas_plus_one(iterator): for _ in iterator: yield pd.Series(1) spark.range(10).repartition(1).select(pandas_plus_one("id")).show() ``` ```python import pandas as pd from pyspark.sql.functions import pandas_udf, PandasUDFType pandas_udf('long', PandasUDFType.SCALAR_ITER) def pandas_plus_one(iterator): for _ in iterator: yield pd.Series(list(range(20))) spark.range(10).repartition(1).select(pandas_plus_one("id")).show() ``` **Before:** ``` RuntimeError: The number of output rows of pandas iterator UDF should be the same with input rows. The input rows number is 10 but the output rows number is 1. ``` ``` AssertionError: Pandas MAP_ITER UDF outputted more rows than input rows. ``` **After:** ``` RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was 1 and the length of input was 10. ``` ``` AssertionError: Pandas SCALAR_ITER UDF outputted more rows than input rows. ``` ### How was this patch tested? Unittests were fixed accordingly. Closes #28135 from HyukjinKwon/SPARK-26412-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
avro | ||
pandas | ||
tests | ||
__init__.py | ||
catalog.py | ||
column.py | ||
conf.py | ||
context.py | ||
dataframe.py | ||
functions.py | ||
group.py | ||
readwriter.py | ||
session.py | ||
streaming.py | ||
types.py | ||
udf.py | ||
utils.py | ||
window.py |