spark-instrumented-optimizer/python/pyspark
Matei Zaharia feba7ee540 SPARK-815. Python parallelize() should split lists before batching
One unfortunate consequence of this fix is that we materialize any
collections that are given to us as generators, but this seems necessary
to get reasonable behavior on small collections. We could add a
batchSize parameter later to bypass auto-computation of batch size if
this becomes a problem (e.g. if users really want to parallelize big
generators nicely)
2013-07-29 02:51:43 -04:00
..
__init__.py Don't download files to master's working directory. 2013-01-21 17:34:17 -08:00
accumulators.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
broadcast.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
cloudpickle.py Rename top-level 'pyspark' directory to 'python' 2013-01-01 15:05:00 -08:00
context.py SPARK-815. Python parallelize() should split lists before batching 2013-07-29 02:51:43 -04:00
daemon.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
files.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
java_gateway.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
join.py Change numSplits to numPartitions in PySpark. 2013-02-24 13:25:09 -08:00
rdd.py Use None instead of empty string as it's slightly smaller/faster 2013-07-29 02:51:43 -04:00
serializers.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
shell.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
tests.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
worker.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00