spark-instrumented-optimizer/python
Ewen Cheslack-Postava 7eaa56de7f Add an add() method to pyspark accumulators.
Add a regular method for adding a term to accumulators in
pyspark. Currently if you have a non-global accumulator, adding to it
is awkward. The += operator can't be used for non-global accumulators
captured via closure because it's involves an assignment. The only way
to do it is using __iadd__ directly.

Adding this method lets you write code like this:

def main():
    sc = SparkContext()
    accum = sc.accumulator(0)

    rdd = sc.parallelize([1,2,3])
    def f(x):
        accum.add(x)
    rdd.foreach(f)
    print accum.value

where using accum += x instead would have caused UnboundLocalError
exceptions in workers. Currently it would have to be written as
accum.__iadd__(x).
2013-10-19 19:55:39 -07:00
..
examples Add banner to PySpark and make wordcount output nicer 2013-09-01 14:13:16 -07:00
lib Fix PySpark for assembly run and include it in dist 2013-08-29 21:19:06 -07:00
pyspark Add an add() method to pyspark accumulators. 2013-10-19 19:55:39 -07:00
test_support Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path 2013-08-16 11:58:20 -07:00
.gitignore Rename top-level 'pyspark' directory to 'python' 2013-01-01 15:05:00 -08:00
epydoc.conf Exclude some private modules in epydoc 2013-09-02 12:22:52 -07:00
run-tests Fix PySpark unit tests on Python 2.6. 2013-08-14 15:12:12 -07:00