spark-instrumented-optimizer

History

Wenchen Fan 962e9bcf94 [SPARK-12756][SQL] use hash expression in Exchange This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one. This PR also fixes the tests that are broken by the new hash behaviour in shuffle. Author: Wenchen Fan <wenchen@databricks.com> Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.		2016-01-13 22:43:28 -08:00
..
ml	[SPARK-11815][ML][PYSPARK] PySpark DecisionTreeClassifier & DecisionTreeRegressor should support setSeed	2016-01-06 10:52:25 -08:00
mllib	[SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft	2016-01-11 14:43:25 -08:00
sql	[SPARK-12756][SQL] use hash expression in Exchange	2016-01-13 22:43:28 -08:00
streaming	[SPARK-12652][PYSPARK] Upgrade Py4J to 0.9.1	2016-01-12 14:27:05 -08:00
__init__.py	[SPARK-12600][SQL] Remove deprecated methods in Spark SQL	2016-01-04 18:02:38 -08:00
accumulators.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
broadcast.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
cloudpickle.py	[SPARK-10542] [PYSPARK] fix serialize namedtuple	2015-09-14 19:46:34 -07:00
conf.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
context.py	[SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming	2016-01-06 12:03:01 -08:00
daemon.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
java_gateway.py	[SPARK-9700] Pick default page size more intelligently.	2015-08-06 23:18:29 -07:00
join.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
profiler.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
rdd.py	[SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levels	2015-12-18 20:06:05 -08:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	[SPARK-10542] [PYSPARK] fix serialize namedtuple	2015-09-14 19:46:34 -07:00
shell.py	[SPARK-12268][PYSPARK] Make pyspark shell pythonstartup work under python3	2016-01-13 12:21:45 -08:00
shuffle.py	[SPARK-10710] Remove ability to disable spilling in core and SQL	2015-09-19 21:40:21 -07:00
statcounter.py	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	2015-09-29 13:38:15 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levels	2015-12-18 20:06:05 -08:00
tests.py	[SPARK-7021] Add JUnit output for Python unit tests	2015-10-22 15:27:11 -07:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	[SPARK-8976] [PYSPARK] fix open mode in python3	2015-08-13 17:33:37 -07:00