Commit graph

9 commits

Author SHA1 Message Date
Josh Rosen cbb7f04aef Add custom serializer support to PySpark.
For now, this only adds MarshalSerializer, but it lays the groundwork
for other supporting custom serializers.  Many of these mechanisms
can also be used to support deserialization of different data formats
sent by Java, such as data encoded by MsgPack.

This also fixes a bug in SparkContext.union().
2013-11-10 16:45:38 -08:00
Josh Rosen 7a9abb9ddc Fix PySpark unit tests on Python 2.6. 2013-08-14 15:12:12 -07:00
Matei Zaharia 96b50e82dc Allow python/run-tests to run from any directory 2013-07-29 02:51:43 -04:00
Matei Zaharia af3c9d5042 Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
Josh Rosen ef711902c1 Don't download files to master's working directory.
This should avoid exceptions caused by existing
files with different contents.

I also removed some unused code.
2013-01-21 17:34:17 -08:00
Josh Rosen 7ed1bf4b48 Add RDD checkpointing to Python API. 2013-01-20 13:19:19 -08:00
Matei Zaharia 61b6382a35 Launch accumulator tests in run-tests 2013-01-20 01:59:07 -08:00
Josh Rosen 1a64432ba5 Indicate success/failure in PySpark test script. 2013-01-09 20:30:36 -08:00
Josh Rosen ce9f1bbe20 Add pyspark script to replace the other scripts.
Expand the PySpark programming guide.
2013-01-01 21:25:49 -08:00