spark-instrumented-optimizer

History

Matei Zaharia feba7ee540 SPARK-815. Python parallelize() should split lists before batching One unfortunate consequence of this fix is that we materialize any collections that are given to us as generators, but this seems necessary to get reasonable behavior on small collections. We could add a batchSize parameter later to bypass auto-computation of batch size if this becomes a problem (e.g. if users really want to parallelize big generators nicely)		2013-07-29 02:51:43 -04:00
..
__init__.py	Don't download files to master's working directory.	2013-01-21 17:34:17 -08:00
accumulators.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
broadcast.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
cloudpickle.py	Rename top-level 'pyspark' directory to 'python'	2013-01-01 15:05:00 -08:00
context.py	SPARK-815. Python parallelize() should split lists before batching	2013-07-29 02:51:43 -04:00
daemon.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
files.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
java_gateway.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
join.py	Change numSplits to numPartitions in PySpark.	2013-02-24 13:25:09 -08:00
rdd.py	Use None instead of empty string as it's slightly smaller/faster	2013-07-29 02:51:43 -04:00
serializers.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
shell.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
tests.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
worker.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00