ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Josh Rosen	c2b105af34	Add documentation for Python API.	2012-12-28 22:51:28 -08:00
Josh Rosen	7ec3595de2	Fix bug (introduced by batching) in PySpark take()	2012-12-28 22:21:16 -08:00
Josh Rosen	fbadb1cda5	Mark api.python classes as private; echo Java output to stderr.	2012-12-28 09:06:11 -08:00
Josh Rosen	665466dfff	Simplify PySpark installation. - Bundle Py4J binaries, since it's hard to install - Uses Spark's `run` script to launch the Py4J gateway, inheriting the settings in spark-env.sh With these changes, (hopefully) nothing more than running `sbt/sbt package` will be necessary to run PySpark.	2012-12-27 22:47:37 -08:00
Josh Rosen	ac32447cd3	Use addFile() to ship code to cluster in PySpark. Add options to pyspark.SparkContext constructor.	2012-12-27 19:59:04 -08:00
Josh Rosen	85b8f2c64f	Add epydoc API documentation for PySpark.	2012-12-27 18:04:10 -08:00
Josh Rosen	2d98fff065	Add IPython support to pyspark-shell. Suggested by / based on code from @MLnick	2012-12-27 10:17:36 -08:00
Josh Rosen	1dca0c5180	Remove debug output from PythonPartitioner.	2012-12-26 18:23:06 -08:00
Josh Rosen	e2dad15621	Add support for batched serialization of Python objects in PySpark.	2012-12-26 18:16:09 -08:00
Josh Rosen	4608902fb8	Use filesystem to collect RDDs in PySpark. Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python.	2012-12-24 17:20:10 -08:00
Josh Rosen	ccd075cf96	Reduce object overhead in Pyspark shuffle and collect	2012-12-24 15:01:13 -08:00
Josh Rosen	2ccf3b6652	Fix PySpark hash partitioning bug. A Java array's hashCode is based on its object identify, not its elements, so this was causing serialized keys to be hashed incorrectly. This commit adds a PySpark-specific workaround and adds more tests.	2012-10-28 22:30:28 -07:00
Josh Rosen	7859879aaa	Bump required Py4J version and add test for large broadcast variables.	2012-10-28 16:48:25 -07:00
Josh Rosen	d4f2e5b0ef	Remove PYTHONPATH from SparkContext's executorEnvs. It makes more sense to pass it in the dictionary of environment variables that is used to construct PythonRDD.	2012-10-22 10:28:59 -07:00
Josh Rosen	c23bf1aff4	Add PySpark README and run scripts.	2012-10-20 00:22:27 +00:00
Josh Rosen	52989c8a2c	Update Python API for v0.6.0 compatibility.	2012-10-19 10:24:49 -07:00
Josh Rosen	e21eb6e00d	Merge tag 'v0.6.0' into python-api	2012-10-19 09:44:32 -07:00
Matei Zaharia	63fe4e9d33	Merge pull request #279 from pwendell/dev Removing credentials line in build.	2012-10-14 19:36:41 -07:00
Patrick Wendell	629dd2691e	Removing credentials line in build.	2012-10-14 19:33:39 -07:00
Matei Zaharia	f8768da418	Comment out PGP stuff for publish-local to work	2012-10-14 17:37:21 -07:00
Matei Zaharia	1f06445b03	tweak	2012-10-14 12:04:58 -07:00
Matei Zaharia	4947bd0958	tweak	2012-10-14 12:02:58 -07:00
Matei Zaharia	6c766a9187	tweak	2012-10-14 12:02:32 -07:00
Matei Zaharia	8192fe0325	Merge branch 'dev' of github.com:mesos/spark into dev	2012-10-14 12:01:38 -07:00
Matei Zaharia	1c73d8974d	Update README	2012-10-14 12:00:25 -07:00
Matei Zaharia	7855bacd26	Merge pull request #278 from pwendell/quickstart-fix Adding dependency repos in quickstart example	2012-10-14 11:52:24 -07:00
Patrick Wendell	7a03a0e35d	Adding dependency repos in quickstart example	2012-10-14 11:48:24 -07:00
Matei Zaharia	64dbf8d372	Made ShuffleDependency automatically find a shuffle ID for itself	2012-10-14 10:00:22 -07:00
Matei Zaharia	64b52166ee	Changed default Hadoop version back to 0.20.205	2012-10-14 09:51:34 -07:00
Matei Zaharia	4be12d97ec	Some doc fixes, including showing version number in nav bar again	2012-10-13 19:05:11 -07:00
Matei Zaharia	19910c00c3	tweaks	2012-10-13 16:22:39 -07:00
Matei Zaharia	4a3e9cf69c	Document how to configure SPARK_MEM & co on a per-job basis	2012-10-13 16:20:25 -07:00
Matei Zaharia	ce6b5a3ee5	Uncomment Maven publishing stuff and set version to 0.6.0	2012-10-13 15:55:39 -07:00
Matei Zaharia	8815aeba0c	Take executor environment vars as an arguemnt to SparkContext	2012-10-13 15:31:11 -07:00
Matei Zaharia	84979499db	Merge pull request #273 from dennybritz/executorVars Let the user specify environment variables to be passed to the Executors	2012-10-13 14:52:14 -07:00
Denny	0700d1920a	Protect from null env variables in mesos.	2012-10-13 13:57:59 -07:00
Denny	21047d923e	Protect from setting null environment variables.	2012-10-13 13:44:24 -07:00
Denny	fa41d50f7d	Don't use system envs for Mesos.	2012-10-13 13:15:50 -07:00
Denny	67c42a41d0	Let the user specify environment variables to be passed to the Executors. Also removed unused variables in the ExecutorRunner.	2012-10-13 13:08:44 -07:00
Matei Zaharia	5b7ee173e1	Update EC2 scripts for Spark 0.6	2012-10-12 19:53:03 -07:00
Matei Zaharia	b4067cbad4	More doc updates, and moved Serializer to a subpackage.	2012-10-12 18:19:21 -07:00
Matei Zaharia	8d7b77bcb5	Some doc and usability improvements: - Added a StorageLevels class for easy access to StorageLevel constants in Java - Added doc comments on Function classes in Java - Updated Accumulator and HadoopWriter docs slightly	2012-10-12 17:53:20 -07:00
Matei Zaharia	682b2d9329	Added a test for when an RDD only partially fits in memory	2012-10-12 14:58:26 -07:00
Matei Zaharia	dca496bb77	Document cartesian() operation	2012-10-12 14:46:41 -07:00
Matei Zaharia	1183b30941	Merge branch 'dev' of github.com:mesos/spark into dev	2012-10-12 14:40:07 -07:00
Matei Zaharia	603b419fdf	Tweak	2012-10-12 14:40:00 -07:00
Matei Zaharia	23015ccac0	Merge pull request #271 from shivaram/block-manager-npe-fix Change block manager to accept a ArrayBuffer	2012-10-12 14:36:28 -07:00
Shivaram Venkataraman	8577523f37	Add test to verify if RDD is computed even if block manager has insufficient memory	2012-10-12 14:14:57 -07:00
Matei Zaharia	bd78bbb2cf	Merge pull request #270 from pwendell/java-javadoc Adding Java documentation	2012-10-11 12:21:47 -07:00
Patrick Wendell	dc8adbd359	Adding Java documentation	2012-10-11 00:49:03 -07:00

1 2 3 4 5 ...

1264 commits