spark-instrumented-optimizer/python/pyspark/sql
Davies Liu a7a93a116d [SPARK-14215] [SQL] [PYSPARK] Support chained Python UDFs
## What changes were proposed in this pull request?

This PR brings the support for chained Python UDFs, for example

```sql
select udf1(udf2(a))
select udf1(udf2(a) + 3)
select udf1(udf2(a) + udf3(b))
```

Also directly chained unary Python UDFs are put in single batch of Python UDFs, others may require multiple batches.

For example,
```python
>>> sqlContext.sql("select double(double(1))").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [pythonUDF#10 AS double(double(1))#9]
:     +- INPUT
+- !BatchPythonEvaluation double(double(1)), [pythonUDF#10]
   +- Scan OneRowRelation[]
>>> sqlContext.sql("select double(double(1) + double(2))").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [pythonUDF#19 AS double((double(1) + double(2)))#16]
:     +- INPUT
+- !BatchPythonEvaluation double((pythonUDF#17 + pythonUDF#18)), [pythonUDF#17,pythonUDF#18,pythonUDF#19]
   +- !BatchPythonEvaluation double(2), [pythonUDF#17,pythonUDF#18]
      +- !BatchPythonEvaluation double(1), [pythonUDF#17]
         +- Scan OneRowRelation[]
```

TODO: will support multiple unrelated Python UDFs in one batch (another PR).

## How was this patch tested?

Added new unit tests for chained UDFs.

Author: Davies Liu <davies@databricks.com>

Closes #12014 from davies/py_udfs.
2016-03-29 15:06:29 -07:00
..
__init__.py [SPARK-12600][SQL] Remove deprecated methods in Spark SQL 2016-01-04 18:02:38 -08:00
column.py [SPARK-14088][SQL] Some Dataset API touch-up 2016-03-22 23:43:09 -07:00
context.py [SPARK-14014][SQL] Integrate session catalog (attempt #2) 2016-03-24 22:59:35 -07:00
dataframe.py [SPARK-14142][SQL] Replace internal use of unionAll with union 2016-03-24 22:34:55 -07:00
functions.py [SPARK-14215] [SQL] [PYSPARK] Support chained Python UDFs 2016-03-29 15:06:29 -07:00
group.py [SPARK-12756][SQL] use hash expression in Exchange 2016-01-13 22:43:28 -08:00
readwriter.py [SPARK-13953][SQL] Specifying the field name for corrupted record via option at JSON datasource 2016-03-22 20:30:48 +08:00
tests.py [SPARK-14215] [SQL] [PYSPARK] Support chained Python UDFs 2016-03-29 15:06:29 -07:00
types.py [SPARK-13593] [SQL] improve the createDataFrame to accept data type string and verify the data 2016-03-08 14:00:03 -08:00
utils.py [SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4 2016-03-28 12:31:12 -07:00
window.py [SPARK-14058][PYTHON] Incorrect docstring in Window.order 2016-03-21 23:52:33 -07:00