spark-instrumented-optimizer/python/pyspark/sql
hyukjinkwon cd9f49a2ae [SPARK-22980][PYTHON][SQL] Clarify the length of each series is of each batch within scalar Pandas UDF
## What changes were proposed in this pull request?

This PR proposes to add a note that saying the length of a scalar Pandas UDF's `Series` is not of the whole input column but of the batch.

We are fine for a group map UDF because the usage is different from our typical UDF but scalar UDFs might cause confusion with the normal UDF.

For example, please consider this example:

```python
from pyspark.sql.functions import pandas_udf, col, lit

df = spark.range(1)
f = pandas_udf(lambda x, y: len(x) + y, LongType())
df.select(f(lit('text'), col('id'))).show()
```

```
+------------------+
|<lambda>(text, id)|
+------------------+
|                 1|
+------------------+
```

```python
from pyspark.sql.functions import udf, col, lit

df = spark.range(1)
f = udf(lambda x, y: len(x) + y, "long")
df.select(f(lit('text'), col('id'))).show()
```

```
+------------------+
|<lambda>(text, id)|
+------------------+
|                 4|
+------------------+
```

## How was this patch tested?

Manually built the doc and checked the output.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #20237 from HyukjinKwon/SPARK-22980.
2018-01-13 16:13:44 +09:00
..
__init__.py [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark 2017-11-02 15:22:52 +01:00
catalog.py [SPARK-22939][PYSPARK] Support Spark UDF in registerFunction 2018-01-04 21:07:31 +08:00
column.py [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as arguments should validate input types for column 2017-08-24 20:29:03 +09:00
conf.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
context.py [SPARK-22939][PYSPARK] Support Spark UDF in registerFunction 2018-01-04 21:07:31 +08:00
dataframe.py [SPARK-22874][PYSPARK][SQL] Modify checking pandas version to use LooseVersion. 2017-12-22 20:09:51 +09:00
functions.py [SPARK-22980][PYTHON][SQL] Clarify the length of each series is of each batch within scalar Pandas UDF 2018-01-13 16:13:44 +09:00
group.py [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0 2017-12-21 20:43:56 +09:00
readwriter.py [SPARK-22818][SQL] csv escape of quote escape 2017-12-29 07:30:06 +08:00
session.py [SPARK-23009][PYTHON] Fix for non-str col names to createDataFrame from Pandas 2018-01-10 14:55:24 +09:00
streaming.py [SPARK-22933][SPARKR] R Structured Streaming API for withWatermark, trigger, partitionBy 2018-01-03 21:43:14 -08:00
tests.py [SPARK-23009][PYTHON] Fix for non-str col names to createDataFrame from Pandas 2018-01-10 14:55:24 +09:00
types.py [SPARK-22566][PYTHON] Better error message for _merge_type in Pandas to Spark DF conversion 2018-01-08 14:32:05 +09:00
udf.py [SPARK-22901][PYTHON][FOLLOWUP] Adds the doc for asNondeterministic for wrapped UDF function 2018-01-06 23:08:26 +08:00
utils.py [SPARK-22874][PYSPARK][SQL][FOLLOW-UP] Modify error messages to show actual versions. 2017-12-25 20:29:10 +09:00
window.py [SPARK-18690][PYTHON][SQL] Backward compatibility of unbounded frames 2016-12-02 17:39:28 -08:00