spark-instrumented-optimizer

History

0x0FFF 6cd98c1878 [SPARK-10417] [SQL] Iterating through Column results in infinite loop `pyspark.sql.column.Column` object has `__getitem__` method, which makes it iterable for Python. In fact it has `__getitem__` to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance) Issue reproduction: ``` df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}'])) for i in df["name"]: print i ``` Author: 0x0FFF <programmerag@gmail.com> Closes #8574 from 0x0FFF/SPARK-10417.		2015-09-02 13:36:36 -07:00
..
ml	[SPARK-9679] [ML] [PYSPARK] Add Python API for Stop Words Remover	2015-09-01 10:48:57 -07:00
mllib	[SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _eventually for ml streaming pyspark tests	2015-08-15 18:48:20 -07:00
sql	[SPARK-10417] [SQL] Iterating through Column results in infinite loop	2015-09-02 13:36:36 -07:00
streaming	[SPARK-10168] [STREAMING] Fix the issue that maven publishes wrong artifact jars	2015-08-24 12:38:01 -07:00
__init__.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
accumulators.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
broadcast.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
cloudpickle.py	[SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__	2015-07-29 22:30:49 -07:00
conf.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
context.py	[MINOR] [SQL] Fix sphinx warnings in PySpark SQL	2015-08-20 10:05:31 -07:00
daemon.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
java_gateway.py	[SPARK-9700] Pick default page size more intelligently.	2015-08-06 23:18:29 -07:00
join.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
profiler.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
rdd.py	[SPARK-9828] [PYSPARK] Mutable values should not be default arguments	2015-08-14 12:46:05 -07:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
shell.py	[SPARK-9270] [PYSPARK] allow --name option in pyspark	2015-07-24 11:56:55 -07:00
shuffle.py	[SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__	2015-07-29 22:30:49 -07:00
statcounter.py	[SPARK-9828] [PYSPARK] Mutable values should not be default arguments	2015-08-14 12:46:05 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-3417] Use new-style classes in PySpark	2014-09-08 15:45:36 -07:00
tests.py	[SPARK-9244] Increase some memory defaults	2015-07-22 15:28:09 -07:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	[SPARK-8976] [PYSPARK] fix open mode in python3	2015-08-13 17:33:37 -07:00