spark-instrumented-optimizer

History

Davies Liu a7a93a116d [SPARK-14215] [SQL] [PYSPARK] Support chained Python UDFs ## What changes were proposed in this pull request? This PR brings the support for chained Python UDFs, for example ```sql select udf1(udf2(a)) select udf1(udf2(a) + 3) select udf1(udf2(a) + udf3(b)) ``` Also directly chained unary Python UDFs are put in single batch of Python UDFs, others may require multiple batches. For example, ```python >>> sqlContext.sql("select double(double(1))").explain() == Physical Plan == WholeStageCodegen : +- Project [pythonUDF#10 AS double(double(1))#9] : +- INPUT +- !BatchPythonEvaluation double(double(1)), [pythonUDF#10] +- Scan OneRowRelation[] >>> sqlContext.sql("select double(double(1) + double(2))").explain() == Physical Plan == WholeStageCodegen : +- Project [pythonUDF#19 AS double((double(1) + double(2)))#16] : +- INPUT +- !BatchPythonEvaluation double((pythonUDF#17 + pythonUDF#18)), [pythonUDF#17,pythonUDF#18,pythonUDF#19] +- !BatchPythonEvaluation double(2), [pythonUDF#17,pythonUDF#18] +- !BatchPythonEvaluation double(1), [pythonUDF#17] +- Scan OneRowRelation[] ``` TODO: will support multiple unrelated Python UDFs in one batch (another PR). ## How was this patch tested? Added new unit tests for chained UDFs. Author: Davies Liu <davies@databricks.com> Closes #12014 from davies/py_udfs.		2016-03-29 15:06:29 -07:00
..
__init__.py	[SPARK-12600][SQL] Remove deprecated methods in Spark SQL	2016-01-04 18:02:38 -08:00
column.py	[SPARK-14088][SQL] Some Dataset API touch-up	2016-03-22 23:43:09 -07:00
context.py	[SPARK-14014][SQL] Integrate session catalog (attempt #2 )	2016-03-24 22:59:35 -07:00
dataframe.py	[SPARK-14142][SQL] Replace internal use of unionAll with union	2016-03-24 22:34:55 -07:00
functions.py	[SPARK-14215] [SQL] [PYSPARK] Support chained Python UDFs	2016-03-29 15:06:29 -07:00
group.py	[SPARK-12756][SQL] use hash expression in Exchange	2016-01-13 22:43:28 -08:00
readwriter.py	[SPARK-13953][SQL] Specifying the field name for corrupted record via option at JSON datasource	2016-03-22 20:30:48 +08:00
tests.py	[SPARK-14215] [SQL] [PYSPARK] Support chained Python UDFs	2016-03-29 15:06:29 -07:00
types.py	[SPARK-13593] [SQL] improve the `createDataFrame` to accept data type string and verify the data	2016-03-08 14:00:03 -08:00
utils.py	[SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4	2016-03-28 12:31:12 -07:00
window.py	[SPARK-14058][PYTHON] Incorrect docstring in Window.order	2016-03-21 23:52:33 -07:00