ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Josh Rosen	cbb7f04aef	Add custom serializer support to PySpark. For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().	2013-11-10 16:45:38 -08:00
Ewen Cheslack-Postava	7eaa56de7f	Add an add() method to pyspark accumulators. Add a regular method for adding a term to accumulators in pyspark. Currently if you have a non-global accumulator, adding to it is awkward. The += operator can't be used for non-global accumulators captured via closure because it's involves an assignment. The only way to do it is using __iadd__ directly. Adding this method lets you write code like this: def main(): sc = SparkContext() accum = sc.accumulator(0) rdd = sc.parallelize([1,2,3]) def f(x): accum.add(x) rdd.foreach(f) print accum.value where using accum += x instead would have caused UnboundLocalError exceptions in workers. Currently it would have to be written as accum.__iadd__(x).	2013-10-19 19:55:39 -07:00
Matei Zaharia	af3c9d5042	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
Josh Rosen	e61729113d	Remove unnecessary doctest __main__ methods.	2013-02-03 21:29:40 -08:00
Josh Rosen	b47d054cfc	Remove use of abc.ABCMeta due to cloudpickle issue. cloudpickle runs into issues while pickling subclasses of AccumulatorParam, which may be related to this Python issue: http://bugs.python.org/issue7689 This seems hard to fix and the ABCMeta wasn't necessary, so I removed it.	2013-01-23 11:47:27 -08:00
Josh Rosen	c75ae3622e	Make AccumulatorParam an abstract base class.	2013-01-21 22:32:57 -08:00
Josh Rosen	17035db159	Add __repr__ to Accumulator; fix bug in sc.accumulator	2013-01-20 11:58:57 -08:00
Matei Zaharia	a23ed25f3c	Add a class comment to Accumulator	2013-01-20 02:10:25 -08:00
Matei Zaharia	8e7f098a2c	Added accumulators to PySpark	2013-01-20 01:57:44 -08:00

9 commits