spark-instrumented-optimizer

History

gatorsmile 9ab296ecdc [SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equi-Join After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I double checked the code. For example, users can do the Equi-Join like ```df.join(df2, 'name', 'outer').select('name', 'height').collect()``` - There exists a bug in 1.5 and 1.4. The code just ignores the third parameter (join type) users pass. However, the join type we called is `Inner`, even if the user-specified type is the other type (e.g., `Outer`). - After a PR: https://github.com/apache/spark/pull/8600, the 1.6 does not have such an issue, but the description has not been updated. Plan to submit another PR to fix 1.5 and issue an error message if users specify a non-inner join type when using Equi-Join. Author: gatorsmile <gatorsmile@gmail.com> Closes #10477 from gatorsmile/pyOuterJoin.		2015-12-27 23:18:48 -08:00
..
ml	[PYSPARK] Pyspark typo & Add missing abstractmethod annotation	2015-12-21 08:53:46 -08:00
mllib	[SPARK-12296][PYSPARK][MLLIB] Feature parity for pyspark mllib standard scaler model	2015-12-22 09:14:12 +02:00
sql	[SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equi-Join	2015-12-27 23:18:48 -08:00
streaming	[SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levels	2015-12-18 20:06:05 -08:00
__init__.py	[SPARK-10373] [PYSPARK] move @since into pyspark from sql	2015-09-08 20:56:22 -07:00
accumulators.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
broadcast.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
cloudpickle.py	[SPARK-10542] [PYSPARK] fix serialize namedtuple	2015-09-14 19:46:34 -07:00
conf.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
context.py	[SPARK-12132] [PYSPARK] raise KeyboardInterrupt inside SIGINT handler	2015-12-07 11:00:25 -08:00
daemon.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
java_gateway.py	[SPARK-9700] Pick default page size more intelligently.	2015-08-06 23:18:29 -07:00
join.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
profiler.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
rdd.py	[SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levels	2015-12-18 20:06:05 -08:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	[SPARK-10542] [PYSPARK] fix serialize namedtuple	2015-09-14 19:46:34 -07:00
shell.py	[SPARK-9270] [PYSPARK] allow --name option in pyspark	2015-07-24 11:56:55 -07:00
shuffle.py	[SPARK-10710] Remove ability to disable spilling in core and SQL	2015-09-19 21:40:21 -07:00
statcounter.py	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	2015-09-29 13:38:15 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levels	2015-12-18 20:06:05 -08:00
tests.py	[SPARK-7021] Add JUnit output for Python unit tests	2015-10-22 15:27:11 -07:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	[SPARK-8976] [PYSPARK] fix open mode in python3	2015-08-13 17:33:37 -07:00