spark-instrumented-optimizer/python/pyspark/sql
Liang-Chi Hsieh 3eb8429699 [SPARK-25461][PYSPARK][SQL] Add document for mismatch between return type of Pandas.Series and return type of pandas udf
## What changes were proposed in this pull request?

For Pandas UDFs, we get arrow type from defined Catalyst return data type of UDFs. We use this arrow type to do serialization of data. If the defined return data type doesn't match with actual return type of Pandas.Series returned by Pandas UDFs, it has a risk to return incorrect data from Python side.

Currently we don't have reliable approach to check if the data conversion is safe or not. We leave some document to notify this to users for now. When there is next upgrade of PyArrow available we can use to check it, we should add the option to check it.

## How was this patch tested?

Only document change.

Closes #22610 from viirya/SPARK-25461.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
2018-10-07 23:18:46 +08:00
..
__init__.py [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark 2017-11-02 15:22:52 +01:00
catalog.py [SPARK-24665][PYSPARK][FOLLOWUP] Use SQLConf in PySpark to manage all sql configs 2018-08-17 10:18:08 +08:00
column.py [SPARK-23847][PYTHON][SQL] Add asc_nulls_first, asc_nulls_last to PySpark 2018-04-08 12:09:06 +08:00
conf.py [SPARK-23698][PYTHON] Resolve undefined names in Python 3 2018-08-22 10:06:59 -07:00
context.py [SPARK-25540][SQL][PYSPARK] Make HiveContext in PySpark behave as the same as Scala. 2018-09-27 09:51:20 +08:00
dataframe.py [SPARK-25592] Setting version to 3.0.0-SNAPSHOT 2018-10-02 08:48:24 -07:00
functions.py [SPARK-25461][PYSPARK][SQL] Add document for mismatch between return type of Pandas.Series and return type of pandas udf 2018-10-07 23:18:46 +08:00
group.py [SPARK-24722][SQL] pivot() with Column type argument 2018-08-04 14:17:32 +08:00
readwriter.py [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 2018-09-13 11:19:43 +08:00
session.py [SPARK-25525][SQL][PYSPARK] Do not update conf for existing SparkContext in SparkSession.getOrCreate. 2018-09-27 12:37:03 +08:00
streaming.py [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 2018-09-13 11:19:43 +08:00
tests.py [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement 2018-10-04 09:36:23 +08:00
types.py [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 2018-09-13 11:19:43 +08:00
udf.py [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement 2018-10-04 09:36:23 +08:00
utils.py [SPARK-24721][SQL] Exclude Python UDFs filters in FileSourceStrategy 2018-08-28 10:57:13 +08:00
window.py [SPARK-23861][SQL][DOC] Clarify default window frame with and without orderBy clause 2018-04-07 00:15:54 +08:00