Josh Rosen
f83068497b
Fix for SPARK-1025: PySpark hang on missing files.
2014-01-23 18:24:51 -08:00
Josh Rosen
61569906cc
Fix SPARK-978: ClassCastException in PySpark cartesian.
2014-01-23 15:09:19 -08:00
Josh Rosen
0035dbbc81
Fix SPARK-1034: Py4JException on PySpark Cartesian Result
2014-01-23 13:05:59 -08:00
Tathagata Das
d4dfab503a
Fixed Python API for sc.setCheckpointDir. Also other fixes based on Reynold's comments on PR 289.
2013-12-24 14:01:13 -08:00
Josh Rosen
3787f514d9
Fix UnicodeEncodeError in PySpark saveAsTextFile().
...
Fixes SPARK-970.
2013-11-28 23:44:56 -08:00
Josh Rosen
cbb7f04aef
Add custom serializer support to PySpark.
...
For now, this only adds MarshalSerializer, but it lays the groundwork
for other supporting custom serializers. Many of these mechanisms
can also be used to support deserialization of different data formats
sent by Java, such as data encoded by MsgPack.
This also fixes a bug in SparkContext.union().
2013-11-10 16:45:38 -08:00
Andre Schumacher
c7e348faec
Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path
2013-08-16 11:58:20 -07:00
Josh Rosen
7a9abb9ddc
Fix PySpark unit tests on Python 2.6.
2013-08-14 15:12:12 -07:00
Matei Zaharia
af3c9d5042
Add Apache license headers and LICENSE and NOTICE files
2013-07-16 17:21:33 -07:00
Jey Kottalam
62c4781400
Add tests and fixes for Python daemon shutdown
2013-06-21 12:14:16 -04:00
Josh Rosen
9cc6ff9c4e
Do not launch JavaGateways on workers (SPARK-674).
...
The problem was that the gateway was being initialized whenever the
pyspark.context module was loaded. The fix uses lazy initialization
that occurs only when SparkContext instances are actually constructed.
I also made the gateway and jvm variables private.
This change results in ~3-4x performance improvement when running the
PySpark unit tests.
2013-02-01 11:13:10 -08:00
Josh Rosen
57b64d0d19
Fix stdout redirection in PySpark.
2013-02-01 00:25:19 -08:00
Stephen Haberman
7dfb82a992
Replace old 'master' term with 'driver'.
2013-01-25 11:03:00 -06:00
Josh Rosen
ae2ed2947d
Allow PySpark's SparkFiles to be used from driver
...
Fix minor documentation formatting issues.
2013-01-23 10:58:50 -08:00
Josh Rosen
35168d9c89
Fix sys.path bug in PySpark SparkContext.addPyFile
2013-01-22 17:54:11 -08:00
Josh Rosen
00d70cd660
Clean up setup code in PySpark checkpointing tests
2013-01-20 15:38:11 -08:00
Josh Rosen
d0ba80dc72
Add checkpointFile() and more tests to PySpark.
2013-01-20 13:59:45 -08:00
Josh Rosen
7ed1bf4b48
Add RDD checkpointing to Python API.
2013-01-20 13:19:19 -08:00