spark-instrumented-optimizer/python/pyspark
Nong Li 5a7af9e7ac [SPARK-13250] [SQL] Update PhysicallRDD to convert to UnsafeRow if using the vectorized scanner.
Some parts of the engine rely on UnsafeRow which the vectorized parquet scanner does not want
to produce. This add a conversion in Physical RDD. In the case where codegen is used (and the
scan is the start of the pipeline), there is no requirement to use UnsafeRow. This patch adds
update PhysicallRDD to support codegen, which eliminates the need for the UnsafeRow conversion
in all cases.

The result of these changes for TPCDS-Q19 at the 10gb sf reduces the query time from 9.5 seconds
to 6.5 seconds.

Author: Nong Li <nong@databricks.com>

Closes #11141 from nongli/spark-13250.
2016-02-24 17:16:45 -08:00
..
ml [SPARK-12153][SPARK-7617][MLLIB] add support of arbitrary length sentence and other tuning for Word2Vec 2016-02-22 09:47:36 +00:00
mllib [SPARK-13429][MLLIB] Unify Logistic Regression convergence tolerance of ML & MLlib 2016-02-22 23:37:09 -08:00
sql [SPARK-13250] [SQL] Update PhysicallRDD to convert to UnsafeRow if using the vectorized scanner. 2016-02-24 17:16:45 -08:00
streaming [SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold 2016-02-19 10:26:38 +00:00
__init__.py [SPARK-12600][SQL] Remove deprecated methods in Spark SQL 2016-01-04 18:02:38 -08:00
accumulators.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
broadcast.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
cloudpickle.py [SPARK-10542] [PYSPARK] fix serialize namedtuple 2015-09-14 19:46:34 -07:00
conf.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
context.py [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming 2016-01-06 12:03:01 -08:00
daemon.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
heapq3.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
java_gateway.py [SPARK-9700] Pick default page size more intelligently. 2015-08-06 23:18:29 -07:00
join.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
profiler.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
rdd.py [SPARK-13467] [PYSPARK] abstract python function to simplify pyspark code 2016-02-24 12:44:54 -08:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-3074] [PySpark] support groupByKey() with single huge key 2015-04-09 17:07:23 -07:00
serializers.py [SPARK-10542] [PYSPARK] fix serialize namedtuple 2015-09-14 19:46:34 -07:00
shell.py [SPARK-12993][PYSPARK] Remove usage of ADD_FILES in pyspark 2016-01-26 14:58:39 -08:00
shuffle.py [SPARK-10710] Remove ability to disable spilling in core and SQL 2015-09-19 21:40:21 -07:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levels 2015-12-18 20:06:05 -08:00
tests.py [SPARK-11295][PYSPARK] Add packages to JUnit output for Python tests 2016-01-20 11:11:10 -08:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
worker.py [SPARK-8976] [PYSPARK] fix open mode in python3 2015-08-13 17:33:37 -07:00