Commit graph

6058 commits

Author SHA1 Message Date
Joseph E. Gonzalez ee8931d2c6 Finished documenting vertexrdd. 2014-01-13 19:30:35 -08:00
Reynold Xin 0fbc0b0561 Merge branch 'graphx' of github.com:ankurdave/incubator-spark into graphx 2014-01-13 18:51:22 -08:00
Reynold Xin 0b18bfba1a Updated doc for PageRank. 2014-01-13 18:51:04 -08:00
Reynold Xin 9317286b72 More cleanup. 2014-01-13 18:45:35 -08:00
Reynold Xin 8e5c732430 Moved SVDPlusPlusConf into SVDPlusPlus object itself. 2014-01-13 18:45:20 -08:00
Joseph E. Gonzalez 552de5d42e Finished second pass on pregel docs. 2014-01-13 18:40:43 -08:00
Joseph E. Gonzalez 622b7f7d39 Minor changes in graphx programming guide. 2014-01-13 18:40:43 -08:00
Reynold Xin 1dce9ce446 Moved PartitionStrategy's into an object. 2014-01-13 18:32:04 -08:00
Reynold Xin ae06d2c22f Updated GraphGenerator. 2014-01-13 18:31:49 -08:00
Reynold Xin 87f335db78 Made more things private. 2014-01-13 18:30:26 -08:00
Reynold Xin a4e12af7aa Merge branch 'graphx' of github.com:ankurdave/incubator-spark into graphx
Conflicts:
	graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
2014-01-13 17:42:59 -08:00
Reynold Xin 02a8f54bfa Miscel doc update. 2014-01-13 17:40:36 -08:00
Joseph E. Gonzalez cfe4a29dcb Improvements in example code for the programming guide as well as adding serialization support for GraphImpl to address issues with failed closure capture. 2014-01-13 17:18:31 -08:00
Ankur Dave ae4b75d94a Add EdgeDirection.Either and use it to fix CC bug
The bug was due to a misunderstanding of the activeSetOpt parameter to
Graph.mapReduceTriplets. Passing EdgeDirection.Both causes
mapReduceTriplets to run only on edges with *both* vertices in the
active set. This commit adds EdgeDirection.Either, which causes
mapReduceTriplets to run on edges with *either* vertex in the active
set. This is what connected components needed.
2014-01-13 17:03:03 -08:00
Ankur Dave 1bd5cefcae Remove aggregateNeighbors 2014-01-13 17:03:03 -08:00
Reynold Xin dc041cd3b6 Merge branch 'scaladoc1' of github.com:rxin/incubator-spark into graphx 2014-01-13 16:25:21 -08:00
Reynold Xin e2d25d2dfe Merge branch 'master' into graphx 2014-01-13 16:21:26 -08:00
Reynold Xin 30328c347b Updated JavaStreamingContext to make scaladoc compile.
`sbt/sbt doc` used to fail. This fixed it.
2014-01-13 15:58:39 -08:00
Ankur Dave 8038da2328 Merge pull request #2 from jegonzal/GraphXCCIssue
Improving documentation and identifying potential bug in CC calculation.
2014-01-13 14:59:30 -08:00
Ankur Dave 97cd27e31b Add graph loader links to doc 2014-01-13 14:54:48 -08:00
Ankur Dave 15ca89b11e Fix mapReduceTriplets links in doc 2014-01-13 14:54:33 -08:00
Joseph E. Gonzalez 80e4d98dc6 Improving documentation and identifying potential bug in CC calculation. 2014-01-13 13:40:16 -08:00
Ankur Dave 9fe88627b5 Improve EdgeRDD scaladoc 2014-01-13 13:16:41 -08:00
Ankur Dave ea69cff711 Further improve VertexRDD scaladocs 2014-01-13 12:52:52 -08:00
Patrick Wendell b93f9d42f2 Merge pull request #400 from tdas/dstream-move
Moved DStream and PairDSream to org.apache.spark.streaming.dstream

Similar to the package location of `org.apache.spark.rdd.RDD`, `DStream` has been moved from `org.apache.spark.streaming.DStream` to `org.apache.spark.streaming.dstream.DStream`. I know that the package name is a little long, but I think its better to keep it consistent with Spark's structure.

Also fixed persistence of windowed DStream. The RDDs generated generated by windowed DStream are essentially unions of underlying RDDs, and persistent these union RDDs would store numerous copies of the underlying data. Instead setting the persistence level on the windowed DStream is made to set the persistence level of the underlying DStream.
2014-01-13 12:18:05 -08:00
Ankur Dave 8ca9773974 Add LiveJournalPageRank example 2014-01-13 12:17:58 -08:00
Reynold Xin e6ed13f255 Merge pull request #397 from pwendell/host-port
Remove now un-needed hostPort option

I noticed this was logging some scary error messages in various places. After I looked into it, this is no longer really used. I removed the option and re-wrote the one remaining use case (it was unnecessary there anyways).
2014-01-12 22:35:14 -08:00
Tathagata Das ffa1d38ef1 Fixed import formatting. 2014-01-12 22:27:07 -08:00
Joseph E. Gonzalez 66c9d0092a Tested and corrected all examples up to mask in the graphx-programming-guide. 2014-01-12 22:11:13 -08:00
Ankur Dave 1efe78a101 Use GraphLoader for algorithms examples in doc 2014-01-12 22:03:03 -08:00
Tathagata Das 777c181d2f Merge remote-tracking branch 'apache/master' into dstream-move
Conflicts:
	streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
2014-01-12 21:59:51 -08:00
Ankur Dave d691e9f47e Move algorithms to GraphOps 2014-01-12 21:47:16 -08:00
Ankur Dave 20c509b805 Add TriangleCount example 2014-01-12 21:41:32 -08:00
Patrick Wendell 0b96d85c20 Merge pull request #399 from pwendell/consolidate-off
Disable shuffle file consolidation by default

After running various performance tests for the 0.9 release, this still seems to have performance issues even on XFS. So let's keep this off-by-default for 0.9 and users can experiment with it depending on their disk configurations.
2014-01-12 21:31:43 -08:00
Patrick Wendell 0ab505a29e Merge pull request #395 from hsaputra/remove_simpleredundantreturn_scala
Remove simple redundant return statements for Scala methods/functions

Remove simple redundant return statements for Scala methods/functions:

-) Only change simple return statements at the end of method
-) Ignore the complex if-else check
-) Ignore the ones inside synchronized
-) Add small changes to making var to val if possible and remove () for simple get

This hopefully makes the review simpler =)

Pass compile and tests.
2014-01-12 21:31:04 -08:00
Joseph E. Gonzalez 2216319f48 adding Pregel as an operator in GraphOps and cleaning up documentation of GraphOps 2014-01-12 21:26:37 -08:00
Joseph E. Gonzalez c787ff5640 Documenting Pregel API 2014-01-12 20:49:52 -08:00
Patrick Wendell 405bfe86ef Merge pull request #394 from tdas/error-handling
Better error handling in Spark Streaming and more API cleanup

Earlier errors in jobs generated by Spark Streaming (or in the generation of jobs) could not be caught from the main driver thread (i.e. the thread that called StreamingContext.start()) as it would be thrown in different threads. With this change, after `ssc.start`, one can call `ssc.awaitTermination()` which will be block until the ssc is closed, or there is an exception. This makes it easier to debug.

This change also adds ssc.stop(<stop-spark-context>) where you can stop StreamingContext without stopping the SparkContext.

Also fixes the bug that came up with PRs #393 and #381. MetadataCleaner default value has been changed from 3500 to -1 for normal SparkContext and 3600 when creating a StreamingContext. Also, updated StreamingListenerBus with changes similar to SparkListenerBus in #392.

And changed a lot of protected[streaming] to private[streaming].
2014-01-12 20:04:21 -08:00
Patrick Wendell 28a6b0cdbc Merge pull request #398 from pwendell/streaming-api
Rename DStream.foreach to DStream.foreachRDD

`foreachRDD` makes it clear that the granularity of this operator is per-RDD.
As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other
DStream operators which get pushed down to individual records within each RDD.
2014-01-12 19:49:36 -08:00
Patrick Wendell 2802cc80bc Disable shuffle file consolidation by default 2014-01-12 19:16:43 -08:00
Henry Saputra 5a8abfb70e Address code review concerns and comments. 2014-01-12 19:15:09 -08:00
Tathagata Das 034f89aaab Fixed persistence logic of WindowedDStream, and fixed default persistence level of input streams. 2014-01-12 19:02:27 -08:00
Patrick Wendell e6e20ceee0 Adding deprecated versions of old code 2014-01-12 18:54:03 -08:00
Tathagata Das 74d0126257 Merge remote-tracking branch 'apache/master' into dstream-move 2014-01-12 18:02:05 -08:00
Tathagata Das aa2c993858 Merge remote-tracking branch 'apache/master' into error-handling 2014-01-12 17:37:46 -08:00
Tathagata Das d1820fef57 Merge branch 'error-handling' into dstream-move 2014-01-12 17:36:49 -08:00
Tathagata Das c7fabb745b Changed StreamingContext.stopForWait to awaitTermination. 2014-01-12 17:21:13 -08:00
Patrick Wendell f4d77f8cb8 Rename DStream.foreach to DStream.foreachRDD
`foreachRDD` makes it clear that the granularity of this operator is per-RDD.
As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other
DStream operators which get pushed down to individual records within each RDD.
2014-01-12 17:21:00 -08:00
Patrick Wendell 074f50232f Merge pull request #396 from pwendell/executor-env
Setting load defaults to true in executor

This preserves the behavior in earlier releases. If properties are set for the executors via `spark-env.sh` on the slaves, then they should take precedence over spark defaults. This is useful for if system administrators are setting properties for a standalone cluster, such as shuffle locations.

/cc @andrewor14 who initially reported this issue.
2014-01-12 17:01:13 -08:00
Ankur Dave 7a4bb863c7 Add connected components example to doc 2014-01-12 16:58:18 -08:00