Commit graph

1262 commits

Author SHA1 Message Date
Josh Rosen fbadb1cda5 Mark api.python classes as private; echo Java output to stderr. 2012-12-28 09:06:11 -08:00
Josh Rosen 665466dfff Simplify PySpark installation.
- Bundle Py4J binaries, since it's hard to install
- Uses Spark's `run` script to launch the Py4J
  gateway, inheriting the settings in spark-env.sh

With these changes, (hopefully) nothing more than
running `sbt/sbt package` will be necessary to run
PySpark.
2012-12-27 22:47:37 -08:00
Josh Rosen ac32447cd3 Use addFile() to ship code to cluster in PySpark.
Add options to pyspark.SparkContext constructor.
2012-12-27 19:59:04 -08:00
Josh Rosen 85b8f2c64f Add epydoc API documentation for PySpark. 2012-12-27 18:04:10 -08:00
Josh Rosen 2d98fff065 Add IPython support to pyspark-shell.
Suggested by / based on code from @MLnick
2012-12-27 10:17:36 -08:00
Josh Rosen 1dca0c5180 Remove debug output from PythonPartitioner. 2012-12-26 18:23:06 -08:00
Josh Rosen e2dad15621 Add support for batched serialization of Python objects in PySpark. 2012-12-26 18:16:09 -08:00
Josh Rosen 4608902fb8 Use filesystem to collect RDDs in PySpark.
Passing large volumes of data through Py4J seems
to be slow.  It appears to be faster to write the
data to the local filesystem and read it back from
Python.
2012-12-24 17:20:10 -08:00
Josh Rosen ccd075cf96 Reduce object overhead in Pyspark shuffle and collect 2012-12-24 15:01:13 -08:00
Josh Rosen 2ccf3b6652 Fix PySpark hash partitioning bug.
A Java array's hashCode is based on its object
identify, not its elements, so this was causing
serialized keys to be hashed incorrectly.

This commit adds a PySpark-specific workaround
and adds more tests.
2012-10-28 22:30:28 -07:00
Josh Rosen 7859879aaa Bump required Py4J version and add test for large broadcast variables. 2012-10-28 16:48:25 -07:00
Josh Rosen d4f2e5b0ef Remove PYTHONPATH from SparkContext's executorEnvs.
It makes more sense to pass it in the dictionary
of environment variables that is used to construct
PythonRDD.
2012-10-22 10:28:59 -07:00
Josh Rosen c23bf1aff4 Add PySpark README and run scripts. 2012-10-20 00:22:27 +00:00
Josh Rosen 52989c8a2c Update Python API for v0.6.0 compatibility. 2012-10-19 10:24:49 -07:00
Josh Rosen e21eb6e00d Merge tag 'v0.6.0' into python-api 2012-10-19 09:44:32 -07:00
Matei Zaharia 63fe4e9d33 Merge pull request #279 from pwendell/dev
Removing credentials line in build.
2012-10-14 19:36:41 -07:00
Patrick Wendell 629dd2691e Removing credentials line in build. 2012-10-14 19:33:39 -07:00
Matei Zaharia f8768da418 Comment out PGP stuff for publish-local to work 2012-10-14 17:37:21 -07:00
Matei Zaharia 1f06445b03 tweak 2012-10-14 12:04:58 -07:00
Matei Zaharia 4947bd0958 tweak 2012-10-14 12:02:58 -07:00
Matei Zaharia 6c766a9187 tweak 2012-10-14 12:02:32 -07:00
Matei Zaharia 8192fe0325 Merge branch 'dev' of github.com:mesos/spark into dev 2012-10-14 12:01:38 -07:00
Matei Zaharia 1c73d8974d Update README 2012-10-14 12:00:25 -07:00
Matei Zaharia 7855bacd26 Merge pull request #278 from pwendell/quickstart-fix
Adding dependency repos in quickstart example
2012-10-14 11:52:24 -07:00
Patrick Wendell 7a03a0e35d Adding dependency repos in quickstart example 2012-10-14 11:48:24 -07:00
Matei Zaharia 64dbf8d372 Made ShuffleDependency automatically find a shuffle ID for itself 2012-10-14 10:00:22 -07:00
Matei Zaharia 64b52166ee Changed default Hadoop version back to 0.20.205 2012-10-14 09:51:34 -07:00
Matei Zaharia 4be12d97ec Some doc fixes, including showing version number in nav bar again 2012-10-13 19:05:11 -07:00
Matei Zaharia 19910c00c3 tweaks 2012-10-13 16:22:39 -07:00
Matei Zaharia 4a3e9cf69c Document how to configure SPARK_MEM & co on a per-job basis 2012-10-13 16:20:25 -07:00
Matei Zaharia ce6b5a3ee5 Uncomment Maven publishing stuff and set version to 0.6.0 2012-10-13 15:55:39 -07:00
Matei Zaharia 8815aeba0c Take executor environment vars as an arguemnt to SparkContext 2012-10-13 15:31:11 -07:00
Matei Zaharia 84979499db Merge pull request #273 from dennybritz/executorVars
Let the user specify environment variables to be passed to the Executors
2012-10-13 14:52:14 -07:00
Denny 0700d1920a Protect from null env variables in mesos. 2012-10-13 13:57:59 -07:00
Denny 21047d923e Protect from setting null environment variables. 2012-10-13 13:44:24 -07:00
Denny fa41d50f7d Don't use system envs for Mesos. 2012-10-13 13:15:50 -07:00
Denny 67c42a41d0 Let the user specify environment variables to be passed to the Executors.
Also removed unused variables in the ExecutorRunner.
2012-10-13 13:08:44 -07:00
Matei Zaharia 5b7ee173e1 Update EC2 scripts for Spark 0.6 2012-10-12 19:53:03 -07:00
Matei Zaharia b4067cbad4 More doc updates, and moved Serializer to a subpackage. 2012-10-12 18:19:21 -07:00
Matei Zaharia 8d7b77bcb5 Some doc and usability improvements:
- Added a StorageLevels class for easy access to StorageLevel constants
  in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
2012-10-12 17:53:20 -07:00
Matei Zaharia 682b2d9329 Added a test for when an RDD only partially fits in memory 2012-10-12 14:58:26 -07:00
Matei Zaharia dca496bb77 Document cartesian() operation 2012-10-12 14:46:41 -07:00
Matei Zaharia 1183b30941 Merge branch 'dev' of github.com:mesos/spark into dev 2012-10-12 14:40:07 -07:00
Matei Zaharia 603b419fdf Tweak 2012-10-12 14:40:00 -07:00
Matei Zaharia 23015ccac0 Merge pull request #271 from shivaram/block-manager-npe-fix
Change block manager to accept a ArrayBuffer
2012-10-12 14:36:28 -07:00
Shivaram Venkataraman 8577523f37 Add test to verify if RDD is computed even if block manager has insufficient
memory
2012-10-12 14:14:57 -07:00
Matei Zaharia bd78bbb2cf Merge pull request #270 from pwendell/java-javadoc
Adding Java documentation
2012-10-11 12:21:47 -07:00
Patrick Wendell dc8adbd359 Adding Java documentation 2012-10-11 00:49:03 -07:00
Shivaram Venkataraman 2cf40c5fd5 Change block manager to accept a ArrayBuffer instead of an iterator to ensure
that the computation can proceed even if we run out of memory to cache the
block. Update CacheTracker to use this new interface
2012-10-11 00:42:46 -07:00
Matei Zaharia 4001cbdec1 Merge pull request #268 from pwendell/sonatype
Adding code for publishing to Sonatype.
2012-10-10 18:57:32 -07:00