ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
eklavya	6a65feebc7	Added foreachPartition method to JavaRDD.	2014-01-13 17:56:47 +05:30
eklavya	dbadc6b994	Added mapPartitions method to JavaRDD.	2014-01-13 17:56:10 +05:30
eklavya	aae8a01425	Added setter method setGenerator to JavaRDD.	2014-01-13 17:53:35 +05:30
Andrew Or	a1f0992fae	Report bytes spilled for both memory and disk on Web UI	2014-01-12 23:42:57 -08:00
Andrew Or	69c9aebed0	Enable external sorting by default	2014-01-12 22:43:01 -08:00
Reynold Xin	e6ed13f255	Merge pull request #397 from pwendell/host-port Remove now un-needed hostPort option I noticed this was logging some scary error messages in various places. After I looked into it, this is no longer really used. I removed the option and re-wrote the one remaining use case (it was unnecessary there anyways).	2014-01-12 22:35:14 -08:00
Andrew Or	8d40e7222f	Get rid of spill map in SparkEnv	2014-01-12 22:34:33 -08:00
Tathagata Das	ffa1d38ef1	Fixed import formatting.	2014-01-12 22:27:07 -08:00
Joseph E. Gonzalez	66c9d0092a	Tested and corrected all examples up to mask in the graphx-programming-guide.	2014-01-12 22:11:13 -08:00
Ankur Dave	1efe78a101	Use GraphLoader for algorithms examples in doc	2014-01-12 22:03:03 -08:00
Tathagata Das	777c181d2f	Merge remote-tracking branch 'apache/master' into dstream-move Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala	2014-01-12 21:59:51 -08:00
Ankur Dave	d691e9f47e	Move algorithms to GraphOps	2014-01-12 21:47:16 -08:00
Ankur Dave	20c509b805	Add TriangleCount example	2014-01-12 21:41:32 -08:00
Patrick Wendell	0b96d85c20	Merge pull request #399 from pwendell/consolidate-off Disable shuffle file consolidation by default After running various performance tests for the 0.9 release, this still seems to have performance issues even on XFS. So let's keep this off-by-default for 0.9 and users can experiment with it depending on their disk configurations.	2014-01-12 21:31:43 -08:00
Patrick Wendell	0ab505a29e	Merge pull request #395 from hsaputra/remove_simpleredundantreturn_scala Remove simple redundant return statements for Scala methods/functions Remove simple redundant return statements for Scala methods/functions: -) Only change simple return statements at the end of method -) Ignore the complex if-else check -) Ignore the ones inside synchronized -) Add small changes to making var to val if possible and remove () for simple get This hopefully makes the review simpler =) Pass compile and tests.	2014-01-12 21:31:04 -08:00
Joseph E. Gonzalez	2216319f48	adding Pregel as an operator in GraphOps and cleaning up documentation of GraphOps	2014-01-12 21:26:37 -08:00
Joseph E. Gonzalez	c787ff5640	Documenting Pregel API	2014-01-12 20:49:52 -08:00
Patrick Wendell	405bfe86ef	Merge pull request #394 from tdas/error-handling Better error handling in Spark Streaming and more API cleanup Earlier errors in jobs generated by Spark Streaming (or in the generation of jobs) could not be caught from the main driver thread (i.e. the thread that called StreamingContext.start()) as it would be thrown in different threads. With this change, after `ssc.start`, one can call `ssc.awaitTermination()` which will be block until the ssc is closed, or there is an exception. This makes it easier to debug. This change also adds ssc.stop(<stop-spark-context>) where you can stop StreamingContext without stopping the SparkContext. Also fixes the bug that came up with PRs #393 and #381. MetadataCleaner default value has been changed from 3500 to -1 for normal SparkContext and 3600 when creating a StreamingContext. Also, updated StreamingListenerBus with changes similar to SparkListenerBus in #392. And changed a lot of protected[streaming] to private[streaming].	2014-01-12 20:04:21 -08:00
Patrick Wendell	28a6b0cdbc	Merge pull request #398 from pwendell/streaming-api Rename DStream.foreach to DStream.foreachRDD `foreachRDD` makes it clear that the granularity of this operator is per-RDD. As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other DStream operators which get pushed down to individual records within each RDD.	2014-01-12 19:49:36 -08:00
Patrick Wendell	2802cc80bc	Disable shuffle file consolidation by default	2014-01-12 19:16:43 -08:00
Henry Saputra	5a8abfb70e	Address code review concerns and comments.	2014-01-12 19:15:09 -08:00
Tathagata Das	034f89aaab	Fixed persistence logic of WindowedDStream, and fixed default persistence level of input streams.	2014-01-12 19:02:27 -08:00
Patrick Wendell	e6e20ceee0	Adding deprecated versions of old code	2014-01-12 18:54:03 -08:00
Tathagata Das	74d0126257	Merge remote-tracking branch 'apache/master' into dstream-move	2014-01-12 18:02:05 -08:00
Matei Zaharia	54d3486ee9	Fix Scala version in docs (it was printed as 2.1)	2014-01-12 17:49:59 -08:00
Tathagata Das	aa2c993858	Merge remote-tracking branch 'apache/master' into error-handling	2014-01-12 17:37:46 -08:00
Tathagata Das	d1820fef57	Merge branch 'error-handling' into dstream-move	2014-01-12 17:36:49 -08:00
Tathagata Das	c7fabb745b	Changed StreamingContext.stopForWait to awaitTermination.	2014-01-12 17:21:13 -08:00
Patrick Wendell	f4d77f8cb8	Rename DStream.foreach to DStream.foreachRDD `foreachRDD` makes it clear that the granularity of this operator is per-RDD. As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other DStream operators which get pushed down to individual records within each RDD.	2014-01-12 17:21:00 -08:00
Patrick Wendell	074f50232f	Merge pull request #396 from pwendell/executor-env Setting load defaults to true in executor This preserves the behavior in earlier releases. If properties are set for the executors via `spark-env.sh` on the slaves, then they should take precedence over spark defaults. This is useful for if system administrators are setting properties for a standalone cluster, such as shuffle locations. /cc @andrewor14 who initially reported this issue.	2014-01-12 17:01:13 -08:00
Ankur Dave	7a4bb863c7	Add connected components example to doc	2014-01-12 16:58:18 -08:00
Reynold Xin	82e2b92c6d	Merge pull request #392 from rxin/listenerbus Stop SparkListenerBus daemon thread when DAGScheduler is stopped. Otherwise this leads to hundreds of SparkListenerBus daemon threads in our unit tests (and also problematic if user applications launches multiple SparkContext).	2014-01-12 16:55:11 -08:00
Patrick Wendell	0bb33076e2	Removing mentions in tests	2014-01-12 16:53:58 -08:00
Patrick Wendell	0d4886c000	Remove now un-needed hostPort option	2014-01-12 16:47:52 -08:00
Tathagata Das	7883b8f579	Fixed bugs to ensure better cleanup of JobScheduler, JobGenerator and NetworkInputTracker upon close.	2014-01-12 16:44:07 -08:00
Patrick Wendell	cfb1e6c13c	Setting load defaults to true in executor	2014-01-12 15:35:08 -08:00
Ankur Dave	5e35d39e0f	Add PageRank example and data	2014-01-12 13:10:53 -08:00
Tathagata Das	448aef6790	Moved DStream, DStreamCheckpointData and PairDStream from org.apache.spark.streaming to org.apache.spark.streaming.dstream.	2014-01-12 11:31:54 -08:00
Ankur Dave	f096f4eaf1	Link methods in programming guide; document VertexID	2014-01-12 10:55:29 -08:00
Henry Saputra	f1c5eca494	Fix accidental comment modification.	2014-01-12 10:40:21 -08:00
Henry Saputra	91a563608e	Merge branch 'master' into remove_simpleredundantreturn_scala	2014-01-12 10:34:13 -08:00
Henry Saputra	93a65e5fde	Remove simple redundant return statement for Scala methods/functions: -) Only change simple return statements at the end of method -) Ignore the complex if-else check -) Ignore the ones inside synchronized	2014-01-12 10:30:04 -08:00
Tathagata Das	c5921e5c61	Fixed bugs.	2014-01-12 01:12:08 -08:00
Matei Zaharia	224f1a754a	Update Python required version to 2.7, and mention MLlib support	2014-01-12 00:15:34 -08:00
Matei Zaharia	5741078c46	Log Python exceptions to stderr as well This helps in case the exception happened while serializing a record to be sent to Java, leaving the stream to Java in an inconsistent state where PythonRDD won't be able to read the error.	2014-01-12 00:10:41 -08:00
Tathagata Das	18f4889d96	Merge remote-tracking branch 'apache/master' into error-handling	2014-01-11 23:40:57 -08:00
Tathagata Das	4d9b0ab420	Added waitForStop and stop to JavaStreamingContext.	2014-01-11 23:35:51 -08:00
Tathagata Das	f5108ffc24	Converted JobScheduler to use actors for event handling. Changed protected[streaming] to private[streaming] in StreamingContext and DStream. Added waitForStop to StreamingContext, and StreamingContextSuite.	2014-01-11 23:15:09 -08:00
Matei Zaharia	f00e949f84	Added Java unit test, data, and main method for Naive Bayes Also fixes mains of a few other algorithms to print the final model	2014-01-11 22:30:48 -08:00
Matei Zaharia	4c28a2bad8	Update some Python MLlib parameters to use camelCase, and tweak docs We've used camel case in other Spark methods so it felt reasonable to keep using it here and make the code match Scala/Java as much as possible. Note that parameter names matter in Python because it allows passing optional parameters by name.	2014-01-11 22:30:48 -08:00

... 5 6 7 8 9 ...

6387 commits