spark-instrumented-optimizer/python/pyspark/sql/tests
HyukjinKwon 4fafdcd63b [SPARK-26412][PYTHON][FOLLOW-UP] Improve error messages in Scala iterator pandas UDF
### What changes were proposed in this pull request?

This PR proposes to improve the error message from Scalar iterator pandas UDF.

### Why are the changes needed?

To show the correct error messages.

### Does this PR introduce any user-facing change?

Yes, but only in unreleased branches.

```python
import pandas as pd
from pyspark.sql.functions import pandas_udf, PandasUDFType

pandas_udf('long', PandasUDFType.SCALAR_ITER)
def pandas_plus_one(iterator):
      for _ in iterator:
            yield pd.Series(1)

spark.range(10).repartition(1).select(pandas_plus_one("id")).show()
```
```python
import pandas as pd
from pyspark.sql.functions import pandas_udf, PandasUDFType

pandas_udf('long', PandasUDFType.SCALAR_ITER)
def pandas_plus_one(iterator):
      for _ in iterator:
            yield pd.Series(list(range(20)))

spark.range(10).repartition(1).select(pandas_plus_one("id")).show()
```

**Before:**

```
RuntimeError: The number of output rows of pandas iterator UDF should
be the same with input rows. The input rows number is 10 but the output
rows number is 1.
```
```
AssertionError: Pandas MAP_ITER UDF outputted more rows than input rows.
```

**After:**

```
RuntimeError: The length of output in Scalar iterator pandas UDF should be
the same with the input's; however, the length of output was 1 and the length
of input was 10.
```
```
AssertionError: Pandas SCALAR_ITER UDF outputted more rows than input rows.
```

### How was this patch tested?

Unittests were fixed accordingly.

Closes #28135 from HyukjinKwon/SPARK-26412-followup.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-09 13:14:41 +09:00
..
__init__.py [SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files 2018-11-14 14:51:11 +08:00
test_arrow.py [SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0 2020-02-11 10:03:01 +09:00
test_catalog.py [SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark 2019-06-24 09:58:17 +09:00
test_column.py [SPARK-29664][PYTHON][SQL] Column.getItem behavior is not consistent with Scala 2019-11-01 12:25:48 +09:00
test_conf.py [SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark 2019-06-24 09:58:17 +09:00
test_context.py [SPARK-30856][SQL][PYSPARK] Fix SQLContext.getOrCreate() when SparkContext is restarted 2020-02-20 12:21:24 +09:00
test_dataframe.py [SPARK-31186][PYSPARK][SQL] toPandas should not fail on duplicate column names 2020-03-27 12:10:30 +09:00
test_datasources.py [SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark 2019-06-24 09:58:17 +09:00
test_functions.py [SPARK-30569][SQL][PYSPARK][SPARKR] Add percentile_approx DSL functions 2020-03-17 10:44:21 +09:00
test_group.py [SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark 2019-06-24 09:58:17 +09:00
test_pandas_cogrouped_map.py [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types 2020-01-22 15:32:58 +09:00
test_pandas_grouped_map.py [SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0 2020-02-11 10:03:01 +09:00
test_pandas_map.py [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types 2020-01-22 15:32:58 +09:00
test_pandas_udf.py [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy 2020-02-18 20:39:50 +08:00
test_pandas_udf_grouped_agg.py [SPARK-30921][PYSPARK] Predicates on python udf should not be pushdown through Aggregate 2020-04-06 09:36:20 +09:00
test_pandas_udf_scalar.py [SPARK-26412][PYTHON][FOLLOW-UP] Improve error messages in Scala iterator pandas UDF 2020-04-09 13:14:41 +09:00
test_pandas_udf_typehints.py [SPARK-31287][PYTHON][SQL] Ignore type hints in groupby.(cogroup.)applyInPandas and mapInPandas 2020-03-29 13:59:18 +09:00
test_pandas_udf_window.py [SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark 2019-06-24 09:58:17 +09:00
test_readwriter.py [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored 2019-07-18 13:37:59 +09:00
test_serde.py [SPARK-29041][PYTHON] Allows createDataFrame to accept bytes as binary type 2019-09-12 08:52:25 +09:00
test_session.py [SPARK-30856][SQL][PYSPARK] Fix SQLContext.getOrCreate() when SparkContext is restarted 2020-02-20 12:21:24 +09:00
test_streaming.py [SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark 2019-06-24 09:58:17 +09:00
test_types.py [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy 2020-02-18 20:39:50 +08:00
test_udf.py [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types 2020-01-22 15:32:58 +09:00
test_utils.py [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly 2019-09-18 23:32:10 +09:00