4fafdcd63b
### What changes were proposed in this pull request? This PR proposes to improve the error message from Scalar iterator pandas UDF. ### Why are the changes needed? To show the correct error messages. ### Does this PR introduce any user-facing change? Yes, but only in unreleased branches. ```python import pandas as pd from pyspark.sql.functions import pandas_udf, PandasUDFType pandas_udf('long', PandasUDFType.SCALAR_ITER) def pandas_plus_one(iterator): for _ in iterator: yield pd.Series(1) spark.range(10).repartition(1).select(pandas_plus_one("id")).show() ``` ```python import pandas as pd from pyspark.sql.functions import pandas_udf, PandasUDFType pandas_udf('long', PandasUDFType.SCALAR_ITER) def pandas_plus_one(iterator): for _ in iterator: yield pd.Series(list(range(20))) spark.range(10).repartition(1).select(pandas_plus_one("id")).show() ``` **Before:** ``` RuntimeError: The number of output rows of pandas iterator UDF should be the same with input rows. The input rows number is 10 but the output rows number is 1. ``` ``` AssertionError: Pandas MAP_ITER UDF outputted more rows than input rows. ``` **After:** ``` RuntimeError: The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was 1 and the length of input was 10. ``` ``` AssertionError: Pandas SCALAR_ITER UDF outputted more rows than input rows. ``` ### How was this patch tested? Unittests were fixed accordingly. Closes #28135 from HyukjinKwon/SPARK-26412-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
__init__.py | ||
test_arrow.py | ||
test_catalog.py | ||
test_column.py | ||
test_conf.py | ||
test_context.py | ||
test_dataframe.py | ||
test_datasources.py | ||
test_functions.py | ||
test_group.py | ||
test_pandas_cogrouped_map.py | ||
test_pandas_grouped_map.py | ||
test_pandas_map.py | ||
test_pandas_udf.py | ||
test_pandas_udf_grouped_agg.py | ||
test_pandas_udf_scalar.py | ||
test_pandas_udf_typehints.py | ||
test_pandas_udf_window.py | ||
test_readwriter.py | ||
test_serde.py | ||
test_session.py | ||
test_streaming.py | ||
test_types.py | ||
test_udf.py | ||
test_utils.py |