Josh Rosen
c2b105af34
Add documentation for Python API.
2012-12-28 22:51:28 -08:00
Josh Rosen
7ec3595de2
Fix bug (introduced by batching) in PySpark take()
2012-12-28 22:21:16 -08:00
Josh Rosen
fbadb1cda5
Mark api.python classes as private; echo Java output to stderr.
2012-12-28 09:06:11 -08:00
Josh Rosen
665466dfff
Simplify PySpark installation.
...
- Bundle Py4J binaries, since it's hard to install
- Uses Spark's `run` script to launch the Py4J
gateway, inheriting the settings in spark-env.sh
With these changes, (hopefully) nothing more than
running `sbt/sbt package` will be necessary to run
PySpark.
2012-12-27 22:47:37 -08:00
Josh Rosen
ac32447cd3
Use addFile() to ship code to cluster in PySpark.
...
Add options to pyspark.SparkContext constructor.
2012-12-27 19:59:04 -08:00
Josh Rosen
85b8f2c64f
Add epydoc API documentation for PySpark.
2012-12-27 18:04:10 -08:00
Josh Rosen
2d98fff065
Add IPython support to pyspark-shell.
...
Suggested by / based on code from @MLnick
2012-12-27 10:17:36 -08:00
Josh Rosen
1dca0c5180
Remove debug output from PythonPartitioner.
2012-12-26 18:23:06 -08:00
Josh Rosen
e2dad15621
Add support for batched serialization of Python objects in PySpark.
2012-12-26 18:16:09 -08:00
Josh Rosen
4608902fb8
Use filesystem to collect RDDs in PySpark.
...
Passing large volumes of data through Py4J seems
to be slow. It appears to be faster to write the
data to the local filesystem and read it back from
Python.
2012-12-24 17:20:10 -08:00
Josh Rosen
ccd075cf96
Reduce object overhead in Pyspark shuffle and collect
2012-12-24 15:01:13 -08:00
Josh Rosen
2ccf3b6652
Fix PySpark hash partitioning bug.
...
A Java array's hashCode is based on its object
identify, not its elements, so this was causing
serialized keys to be hashed incorrectly.
This commit adds a PySpark-specific workaround
and adds more tests.
2012-10-28 22:30:28 -07:00
Josh Rosen
7859879aaa
Bump required Py4J version and add test for large broadcast variables.
2012-10-28 16:48:25 -07:00
Josh Rosen
d4f2e5b0ef
Remove PYTHONPATH from SparkContext's executorEnvs.
...
It makes more sense to pass it in the dictionary
of environment variables that is used to construct
PythonRDD.
2012-10-22 10:28:59 -07:00
Josh Rosen
c23bf1aff4
Add PySpark README and run scripts.
2012-10-20 00:22:27 +00:00
Josh Rosen
52989c8a2c
Update Python API for v0.6.0 compatibility.
2012-10-19 10:24:49 -07:00
Josh Rosen
e21eb6e00d
Merge tag 'v0.6.0' into python-api
2012-10-19 09:44:32 -07:00
Matei Zaharia
63fe4e9d33
Merge pull request #279 from pwendell/dev
...
Removing credentials line in build.
2012-10-14 19:36:41 -07:00
Patrick Wendell
629dd2691e
Removing credentials line in build.
2012-10-14 19:33:39 -07:00
Matei Zaharia
f8768da418
Comment out PGP stuff for publish-local to work
2012-10-14 17:37:21 -07:00
Matei Zaharia
1f06445b03
tweak
2012-10-14 12:04:58 -07:00
Matei Zaharia
4947bd0958
tweak
2012-10-14 12:02:58 -07:00
Matei Zaharia
6c766a9187
tweak
2012-10-14 12:02:32 -07:00
Matei Zaharia
8192fe0325
Merge branch 'dev' of github.com:mesos/spark into dev
2012-10-14 12:01:38 -07:00
Matei Zaharia
1c73d8974d
Update README
2012-10-14 12:00:25 -07:00
Matei Zaharia
7855bacd26
Merge pull request #278 from pwendell/quickstart-fix
...
Adding dependency repos in quickstart example
2012-10-14 11:52:24 -07:00
Patrick Wendell
7a03a0e35d
Adding dependency repos in quickstart example
2012-10-14 11:48:24 -07:00
Matei Zaharia
64dbf8d372
Made ShuffleDependency automatically find a shuffle ID for itself
2012-10-14 10:00:22 -07:00
Matei Zaharia
64b52166ee
Changed default Hadoop version back to 0.20.205
2012-10-14 09:51:34 -07:00
Matei Zaharia
4be12d97ec
Some doc fixes, including showing version number in nav bar again
2012-10-13 19:05:11 -07:00
Matei Zaharia
19910c00c3
tweaks
2012-10-13 16:22:39 -07:00
Matei Zaharia
4a3e9cf69c
Document how to configure SPARK_MEM & co on a per-job basis
2012-10-13 16:20:25 -07:00
Matei Zaharia
ce6b5a3ee5
Uncomment Maven publishing stuff and set version to 0.6.0
2012-10-13 15:55:39 -07:00
Matei Zaharia
8815aeba0c
Take executor environment vars as an arguemnt to SparkContext
2012-10-13 15:31:11 -07:00
Matei Zaharia
84979499db
Merge pull request #273 from dennybritz/executorVars
...
Let the user specify environment variables to be passed to the Executors
2012-10-13 14:52:14 -07:00
Denny
0700d1920a
Protect from null env variables in mesos.
2012-10-13 13:57:59 -07:00
Denny
21047d923e
Protect from setting null environment variables.
2012-10-13 13:44:24 -07:00
Denny
fa41d50f7d
Don't use system envs for Mesos.
2012-10-13 13:15:50 -07:00
Denny
67c42a41d0
Let the user specify environment variables to be passed to the Executors.
...
Also removed unused variables in the ExecutorRunner.
2012-10-13 13:08:44 -07:00
Matei Zaharia
5b7ee173e1
Update EC2 scripts for Spark 0.6
2012-10-12 19:53:03 -07:00
Matei Zaharia
b4067cbad4
More doc updates, and moved Serializer to a subpackage.
2012-10-12 18:19:21 -07:00
Matei Zaharia
8d7b77bcb5
Some doc and usability improvements:
...
- Added a StorageLevels class for easy access to StorageLevel constants
in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
2012-10-12 17:53:20 -07:00
Matei Zaharia
682b2d9329
Added a test for when an RDD only partially fits in memory
2012-10-12 14:58:26 -07:00
Matei Zaharia
dca496bb77
Document cartesian() operation
2012-10-12 14:46:41 -07:00
Matei Zaharia
1183b30941
Merge branch 'dev' of github.com:mesos/spark into dev
2012-10-12 14:40:07 -07:00
Matei Zaharia
603b419fdf
Tweak
2012-10-12 14:40:00 -07:00
Matei Zaharia
23015ccac0
Merge pull request #271 from shivaram/block-manager-npe-fix
...
Change block manager to accept a ArrayBuffer
2012-10-12 14:36:28 -07:00
Shivaram Venkataraman
8577523f37
Add test to verify if RDD is computed even if block manager has insufficient
...
memory
2012-10-12 14:14:57 -07:00
Matei Zaharia
bd78bbb2cf
Merge pull request #270 from pwendell/java-javadoc
...
Adding Java documentation
2012-10-11 12:21:47 -07:00
Patrick Wendell
dc8adbd359
Adding Java documentation
2012-10-11 00:49:03 -07:00