spark-instrumented-optimizer

History

Ewen Cheslack-Postava 7eaa56de7f Add an add() method to pyspark accumulators. Add a regular method for adding a term to accumulators in pyspark. Currently if you have a non-global accumulator, adding to it is awkward. The += operator can't be used for non-global accumulators captured via closure because it's involves an assignment. The only way to do it is using __iadd__ directly. Adding this method lets you write code like this: def main(): sc = SparkContext() accum = sc.accumulator(0) rdd = sc.parallelize([1,2,3]) def f(x): accum.add(x) rdd.foreach(f) print accum.value where using accum += x instead would have caused UnboundLocalError exceptions in workers. Currently it would have to be written as accum.__iadd__(x).		2013-10-19 19:55:39 -07:00
..
examples	Add banner to PySpark and make wordcount output nicer	2013-09-01 14:13:16 -07:00
lib	Fix PySpark for assembly run and include it in dist	2013-08-29 21:19:06 -07:00
pyspark	Add an add() method to pyspark accumulators.	2013-10-19 19:55:39 -07:00
test_support	Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path	2013-08-16 11:58:20 -07:00
.gitignore	Rename top-level 'pyspark' directory to 'python'	2013-01-01 15:05:00 -08:00
epydoc.conf	Exclude some private modules in epydoc	2013-09-02 12:22:52 -07:00
run-tests	Fix PySpark unit tests on Python 2.6.	2013-08-14 15:12:12 -07:00