spark-instrumented-optimizer/python/pyspark/sql
Davies Liu ba8c86d06f [SPARK-13671] [SPARK-13311] [SQL] Use different physical plans for RDD and data sources
## What changes were proposed in this pull request?

This PR split the PhysicalRDD into two classes, PhysicalRDD and PhysicalScan. PhysicalRDD is used for DataFrames that is created from existing RDD. PhysicalScan is used for DataFrame that is created from data sources. This enable use to apply different optimization on both of them.

Also fix the problem for sameResult() on two DataSourceScan.

Also fix the equality check to toString for `In`. It's better to use Seq there, but we can't break this public API (sad).

## How was this patch tested?

Existing tests. Manually tested with TPCDS query Q59 and Q64, all those duplicated exchanges can be re-used now, also saw there are 40+% performance improvement (saving half of the scan).

Author: Davies Liu <davies@databricks.com>

Closes #11514 from davies/existing_rdd.
2016-03-12 00:48:36 -08:00
..
__init__.py [SPARK-12600][SQL] Remove deprecated methods in Spark SQL 2016-01-04 18:02:38 -08:00
column.py [SPARK-12799] Simplify various string output for expressions 2016-02-21 22:53:15 +08:00
context.py [SPARK-13593] [SQL] improve the createDataFrame to accept data type string and verify the data 2016-03-08 14:00:03 -08:00
dataframe.py [SPARK-13671] [SPARK-13311] [SQL] Use different physical plans for RDD and data sources 2016-03-12 00:48:36 -08:00
functions.py [MINOR] Fix typo in 'hypot' docstring 2016-03-09 18:05:03 -08:00
group.py [SPARK-12756][SQL] use hash expression in Exchange 2016-01-13 22:43:28 -08:00
readwriter.py [SPARK-13543][SQL] Support for specifying compression codec for Parquet/ORC via option() 2016-03-03 10:30:55 -08:00
tests.py [SPARK-13593] [SQL] improve the createDataFrame to accept data type string and verify the data 2016-03-08 14:00:03 -08:00
types.py [SPARK-13593] [SQL] improve the createDataFrame to accept data type string and verify the data 2016-03-08 14:00:03 -08:00
utils.py [SPARK-11804] [PYSPARK] Exception raise when using Jdbc predicates opt… 2015-11-18 08:18:54 -08:00
window.py [SPARK-10373] [PYSPARK] move @since into pyspark from sql 2015-09-08 20:56:22 -07:00