spark-instrumented-optimizer/python/pyspark
Andrew Ash ba5bcaddec SPARK-3211 .take() is OOM-prone with empty partitions
Instead of jumping straight from 1 partition to all partitions, do exponential
growth and double the number of partitions to attempt each time instead.

Fix proposed by Paul Nepywoda

Author: Andrew Ash <andrew@andrewash.com>

Closes #2117 from ash211/SPARK-3211 and squashes the following commits:

8b2299a [Andrew Ash] Quadruple instead of double for a minor speedup
e5f7e4d [Andrew Ash] Update comment to better reflect what we're doing
09a27f7 [Andrew Ash] Update PySpark to be less OOM-prone as well
3a156b8 [Andrew Ash] SPARK-3211 .take() is OOM-prone with empty partitions
2014-09-05 18:52:05 -07:00
..
mllib [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
__init__.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
accumulators.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
broadcast.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
cloudpickle.py [SPARK-791] [PySpark] fix pickle itemgetter with cloudpickle 2014-07-29 01:02:18 -07:00
conf.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
context.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
daemon.py [SPARK-2898] [PySpark] fix bugs in deamon.py 2014-08-10 13:00:38 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
heapq3.py [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey() 2014-08-26 16:57:40 -07:00
java_gateway.py [SPARK-3167] Handle special driver configs in Windows 2014-08-26 22:52:16 -07:00
join.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
rdd.py SPARK-3211 .take() is OOM-prone with empty partitions 2014-09-05 18:52:05 -07:00
rddsampler.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
resultiterable.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
serializers.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
shell.py [SPARK-2435] Add shutdown hook to pyspark 2014-09-03 19:37:37 -07:00
shuffle.py [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey() 2014-08-26 16:57:40 -07:00
sql.py [SPARK-3378] [DOCS] Replace the word "SparkSQL" with right word "Spark SQL" 2014-09-04 15:06:08 -07:00
statcounter.py StatCounter on NumPy arrays [PYSPARK][SPARK-2012] 2014-08-01 22:33:25 -07:00
storagelevel.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
tests.py [SPARK-3335] [SQL] [PySpark] support broadcast in Python UDF 2014-09-03 19:08:39 -07:00
worker.py [SPARK-3114] [PySpark] Fix Python UDFs in Spark SQL. 2014-08-18 20:42:19 -07:00