ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Josh Rosen	b58340dbd9	Rename top-level 'pyspark' directory to 'python'	2013-01-01 15:05:00 -08:00
Josh Rosen	170e451fbd	Minor documentation and style fixes for PySpark.	2013-01-01 13:52:14 -08:00
Josh Rosen	6f6a6b79c4	Launch with `scala` by default in run-pyspark	2012-12-31 14:57:18 -08:00
Josh Rosen	099898b439	Port LR example to PySpark using numpy. This version of the example crashes after the first iteration with "OverflowError: math range error" because Python's math.exp() behaves differently than Scala's; see SPARK-646.	2012-12-29 18:00:28 -08:00
Josh Rosen	39dd953fd8	Add test for pyspark.RDD.saveAsTextFile().	2012-12-29 17:06:50 -08:00
Josh Rosen	59195c68ec	Update PySpark for compatibility with TaskContext.	2012-12-29 16:01:03 -08:00
Josh Rosen	c5cee53f20	Merge remote-tracking branch 'origin/master' into python-api Conflicts: docs/quick-start.md	2012-12-29 16:00:51 -08:00
Josh Rosen	26186e2d25	Use batching in pyspark parallelize(); fix cartesian()	2012-12-29 15:34:57 -08:00
Matei Zaharia	3f74f729a1	Merge pull request #345 from JoshRosen/fix/add-file Fix deletion of files in current working directory by clearFiles()	2012-12-29 15:01:33 -08:00
Josh Rosen	6ee1ff2663	Fix bug in pyspark.serializers.batch; add .gitignore.	2012-12-29 22:25:34 +00:00
Josh Rosen	c2b105af34	Add documentation for Python API.	2012-12-28 22:51:28 -08:00
Josh Rosen	7ec3595de2	Fix bug (introduced by batching) in PySpark take()	2012-12-28 22:21:16 -08:00
Josh Rosen	397e67103c	Change Utils.fetchFile() warning to SparkException.	2012-12-28 17:37:13 -08:00
Josh Rosen	d64fa72d2e	Add addFile() and addJar() to JavaSparkContext.	2012-12-28 17:00:57 -08:00
Josh Rosen	bd237d4a9d	Add synchronization to LocalScheduler.updateDependencies().	2012-12-28 17:00:57 -08:00
Josh Rosen	f1bf4f0385	Skip deletion of files in clearFiles(). This fixes an issue where Spark could delete original files in the current working directory that were added to the job using addFile(). There was also the potential for addFile() to overwrite local files, which is addressed by changing Utils.fetchFile() to log a warning instead of overwriting a file with new contents. This is a short-term fix; a better long-term solution would be to remove the dependence on storing files in the current working directory, since we can't change the cwd from Java.	2012-12-28 17:00:57 -08:00
Josh Rosen	fbadb1cda5	Mark api.python classes as private; echo Java output to stderr.	2012-12-28 09:06:11 -08:00
Josh Rosen	665466dfff	Simplify PySpark installation. - Bundle Py4J binaries, since it's hard to install - Uses Spark's `run` script to launch the Py4J gateway, inheriting the settings in spark-env.sh With these changes, (hopefully) nothing more than running `sbt/sbt package` will be necessary to run PySpark.	2012-12-27 22:47:37 -08:00
Josh Rosen	ac32447cd3	Use addFile() to ship code to cluster in PySpark. Add options to pyspark.SparkContext constructor.	2012-12-27 19:59:04 -08:00
Josh Rosen	85b8f2c64f	Add epydoc API documentation for PySpark.	2012-12-27 18:04:10 -08:00
Josh Rosen	2d98fff065	Add IPython support to pyspark-shell. Suggested by / based on code from @MLnick	2012-12-27 10:17:36 -08:00
Josh Rosen	1dca0c5180	Remove debug output from PythonPartitioner.	2012-12-26 18:23:06 -08:00
Josh Rosen	e2dad15621	Add support for batched serialization of Python objects in PySpark.	2012-12-26 18:16:09 -08:00
Josh Rosen	4608902fb8	Use filesystem to collect RDDs in PySpark. Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python.	2012-12-24 17:20:10 -08:00
Matei Zaharia	84587a9bf3	Merge pull request #343 from markhamstra/spark-601 lookup() needn't fail when there is no partitioner	2012-12-24 15:28:05 -08:00
Josh Rosen	ccd075cf96	Reduce object overhead in Pyspark shuffle and collect	2012-12-24 15:01:13 -08:00
Mark Hamstra	903f3518df	fall back to filter-map-collect when calling lookup() on an RDD without a partitioner	2012-12-24 13:18:45 -08:00
Matei Zaharia	b575cbe069	Merge pull request #342 from markhamstra/spark-645 Allow distinct() to be called without parentheses	2012-12-24 08:04:50 -08:00
Mark Hamstra	61be8566e2	Allow distinct() to be called without parentheses when using the default number of splits.	2012-12-24 02:36:47 -08:00
Reynold Xin	a6bb41c6d3	Updated Kryo version for Maven pom file.	2012-12-21 16:25:50 -08:00
Reynold Xin	c68a076037	Updated Kryo documentation for Kryo version update.	2012-12-21 16:03:17 -08:00
Reynold Xin	60f7338092	Remove the call to close input stream in Kryo serializer.	2012-12-21 15:49:33 -08:00
Matei Zaharia	3334b7c6b5	Merge pull request #341 from rxin/4a3fb06ac2d11125feb08acbbd4df76d1e91b677 Kryo2 update against Spark master	2012-12-21 15:31:23 -08:00
Matei Zaharia	5e51b889fe	Merge pull request #327 from rxin/spark-633 Added the ability in block manager to remove blocks.	2012-12-20 11:33:38 -08:00
Reynold Xin	9397c5014e	Let the slave notify the master block removal.	2012-12-20 01:37:09 -08:00
Matei Zaharia	e7051767f7	Merge pull request #337 from pwendell/worker-liveness-ui SPARK-616: Logging dead workers in Web UI.	2012-12-19 15:31:32 -08:00
Reynold Xin	68c52d80ec	Moved BlockManager's IdGenerator into BlockManager object. Removed some excessive debug messages.	2012-12-19 15:27:23 -08:00
Matei Zaharia	30b47794da	Merge pull request #340 from tomdz/deb-packaging-tweaks Tweaked debian packaging to be a bit more in line with debian standards	2012-12-19 12:07:03 -08:00
Thomas Dudziak	5488ac67c3	Tweaked debian packaging to be a bit more in line with debian standards	2012-12-19 10:20:43 -08:00
Matei Zaharia	1e6e154d6d	Merge pull request #338 from tomdz/repl-pom-fix Fixed repl maven build	2012-12-18 14:03:29 -08:00
Thomas Dudziak	4af6cad37a	Fixed repl maven build to produce artifacts with the appropriate hadoop classifier and extracted repl fat-jar and debian packaging into a separate project to make Maven happy	2012-12-18 12:08:19 -08:00
Patrick Wendell	bfac06e1f6	SPARK-616: Logging dead workers in Web UI. This patch keeps track of which workers have died and marks them as such in the master web UI. It also handles workers which die and re-register using different actor ID's.	2012-12-17 23:09:05 -08:00
Matei Zaharia	b82a6dd2c7	Merge pull request #332 from JoshRosen/spark-607 Add try-finally to handle MapOutputTracker timeouts	2012-12-14 11:41:16 -08:00
Reynold Xin	06f855c24d	Merge branch 'spark-633' of github.com:rxin/spark into spark-633	2012-12-14 00:27:24 -08:00
Reynold Xin	8c01295b85	Fixed conflicts from merging Charles' and TD's block manager changes.	2012-12-14 00:26:36 -08:00
Matei Zaharia	1072f970cc	Merge pull request #331 from woggling/deploy-exit-status Have standalone cluster report exit codes to clients	2012-12-13 22:43:48 -08:00
Charles Reiss	c528932a41	Code review cleanup.	2012-12-13 22:37:16 -08:00
Charles Reiss	0aad42b5e7	Have standalone cluster report exit codes to clients. Addresses SPARK-639.	2012-12-13 22:37:16 -08:00
Reynold Xin	0235667f73	Merge branch 'master' of github.com:mesos/spark into spark-633	2012-12-13 22:33:41 -08:00
Reynold Xin	97434f49b8	Merged TD's block manager refactoring.	2012-12-13 22:32:19 -08:00

1 2 3 4 5 ...

1517 commits