ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Ewen Cheslack-Postava	7eaa56de7f	Add an add() method to pyspark accumulators. Add a regular method for adding a term to accumulators in pyspark. Currently if you have a non-global accumulator, adding to it is awkward. The += operator can't be used for non-global accumulators captured via closure because it's involves an assignment. The only way to do it is using __iadd__ directly. Adding this method lets you write code like this: def main(): sc = SparkContext() accum = sc.accumulator(0) rdd = sc.parallelize([1,2,3]) def f(x): accum.add(x) rdd.foreach(f) print accum.value where using accum += x instead would have caused UnboundLocalError exceptions in workers. Currently it would have to be written as accum.__iadd__(x).	2013-10-19 19:55:39 -07:00
Matei Zaharia	af3c9d5042	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
Josh Rosen	e61729113d	Remove unnecessary doctest __main__ methods.	2013-02-03 21:29:40 -08:00
Josh Rosen	b47d054cfc	Remove use of abc.ABCMeta due to cloudpickle issue. cloudpickle runs into issues while pickling subclasses of AccumulatorParam, which may be related to this Python issue: http://bugs.python.org/issue7689 This seems hard to fix and the ABCMeta wasn't necessary, so I removed it.	2013-01-23 11:47:27 -08:00
Josh Rosen	c75ae3622e	Make AccumulatorParam an abstract base class.	2013-01-21 22:32:57 -08:00
Josh Rosen	17035db159	Add __repr__ to Accumulator; fix bug in sc.accumulator	2013-01-20 11:58:57 -08:00
Matei Zaharia	a23ed25f3c	Add a class comment to Accumulator	2013-01-20 02:10:25 -08:00
Matei Zaharia	8e7f098a2c	Added accumulators to PySpark	2013-01-20 01:57:44 -08:00

8 commits