spark-instrumented-optimizer/python/pyspark
Davies Liu 5743c6476d [SPARK-12981] [SQL] extract Pyhton UDF in physical plan
## What changes were proposed in this pull request?

Currently we extract Python UDFs into a special logical plan EvaluatePython in analyzer, But EvaluatePython is not part of catalyst, many rules have no knowledge of it , which will break many things (for example, filter push down or column pruning).

We should treat Python UDFs as normal expressions, until we want to evaluate in physical plan, we could extract them in end of optimizer, or physical plan.

This PR extract Python UDFs in physical plan.

Closes #10935

## How was this patch tested?

Added regression tests.

Author: Davies Liu <davies@databricks.com>

Closes #12127 from davies/py_udf.
2016-04-04 10:56:26 -07:00
..
ml [SPARK-14305][ML][PYSPARK] PySpark ml.clustering BisectingKMeans support export/import 2016-04-01 12:53:39 -07:00
mllib [SPARK-13672][ML] Add python examples of BisectingKMeans in ML and MLLIB 2016-03-11 09:21:12 +02:00
sql [SPARK-12981] [SQL] extract Pyhton UDF in physical plan 2016-04-04 10:56:26 -07:00
streaming [SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to Spark 2016-03-25 17:37:16 -07:00
__init__.py [SPARK-10380][SQL] Fix confusing documentation examples for astype/drop_duplicates. 2016-03-14 19:25:49 -07:00
accumulators.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
broadcast.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
cloudpickle.py [SPARK-13697] [PYSPARK] Fix the missing module name of TransformFunctionSerializer.loads 2016-03-06 08:57:01 -08:00
conf.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
context.py [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming 2016-01-06 12:03:01 -08:00
daemon.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
heapq3.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
java_gateway.py [SPARK-9700] Pick default page size more intelligently. 2015-08-06 23:18:29 -07:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
rdd.py [SPARK-13467] [PYSPARK] abstract python function to simplify pyspark code 2016-02-24 12:44:54 -08:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-3074] [PySpark] support groupByKey() with single huge key 2015-04-09 17:07:23 -07:00
serializers.py [SPARK-10542] [PYSPARK] fix serialize namedtuple 2015-09-14 19:46:34 -07:00
shell.py [SPARK-12993][PYSPARK] Remove usage of ADD_FILES in pyspark 2016-01-26 14:58:39 -08:00
shuffle.py [SPARK-10710] Remove ability to disable spilling in core and SQL 2015-09-19 21:40:21 -07:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-14342][CORE][DOCS][TESTS] Remove straggler references to Tachyon 2016-04-02 17:55:46 -07:00
tests.py [SPARK-13697] [PYSPARK] Fix the missing module name of TransformFunctionSerializer.loads 2016-03-06 08:57:01 -08:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
worker.py [SPARK-14267] [SQL] [PYSPARK] execute multiple Python UDFs within single batch 2016-03-31 16:40:20 -07:00