Commit graph

6326 commits

Author SHA1 Message Date
Joseph E. Gonzalez ee8931d2c6 Finished documenting vertexrdd. 2014-01-13 19:30:35 -08:00
Patrick Wendell d4cd5debf4 Fix for Kryo Serializer 2014-01-13 19:03:59 -08:00
Reynold Xin 0fbc0b0561 Merge branch 'graphx' of github.com:ankurdave/incubator-spark into graphx 2014-01-13 18:51:22 -08:00
Reynold Xin 0b18bfba1a Updated doc for PageRank. 2014-01-13 18:51:04 -08:00
Reynold Xin 9317286b72 More cleanup. 2014-01-13 18:45:35 -08:00
Reynold Xin 8e5c732430 Moved SVDPlusPlusConf into SVDPlusPlus object itself. 2014-01-13 18:45:20 -08:00
Raymond Liu 4c22c55ad6 Address comments to fix code formats 2014-01-14 10:41:42 +08:00
Joseph E. Gonzalez 552de5d42e Finished second pass on pregel docs. 2014-01-13 18:40:43 -08:00
Joseph E. Gonzalez 622b7f7d39 Minor changes in graphx programming guide. 2014-01-13 18:40:43 -08:00
Raymond Liu 161ab93989 Yarn workerRunnable refactor 2014-01-14 10:36:00 +08:00
Raymond Liu 79a5ba3497 Yarn Client refactor 2014-01-14 10:33:48 +08:00
Reynold Xin 1dce9ce446 Moved PartitionStrategy's into an object. 2014-01-13 18:32:04 -08:00
Reynold Xin ae06d2c22f Updated GraphGenerator. 2014-01-13 18:31:49 -08:00
Reynold Xin 87f335db78 Made more things private. 2014-01-13 18:30:26 -08:00
Reynold Xin a4e12af7aa Merge branch 'graphx' of github.com:ankurdave/incubator-spark into graphx
Conflicts:
	graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
2014-01-13 17:42:59 -08:00
Reynold Xin 02a8f54bfa Miscel doc update. 2014-01-13 17:40:36 -08:00
Tathagata Das 1233b3de01 Merge remote-tracking branch 'apache/master' into filestream-fix 2014-01-13 17:29:19 -08:00
Joseph E. Gonzalez cfe4a29dcb Improvements in example code for the programming guide as well as adding serialization support for GraphImpl to address issues with failed closure capture. 2014-01-13 17:18:31 -08:00
Ankur Dave ae4b75d94a Add EdgeDirection.Either and use it to fix CC bug
The bug was due to a misunderstanding of the activeSetOpt parameter to
Graph.mapReduceTriplets. Passing EdgeDirection.Both causes
mapReduceTriplets to run only on edges with *both* vertices in the
active set. This commit adds EdgeDirection.Either, which causes
mapReduceTriplets to run on edges with *either* vertex in the active
set. This is what connected components needed.
2014-01-13 17:03:03 -08:00
Ankur Dave 1bd5cefcae Remove aggregateNeighbors 2014-01-13 17:03:03 -08:00
Tathagata Das c0bb38e8aa Improved file input stream further. 2014-01-13 16:54:52 -08:00
Reynold Xin dc041cd3b6 Merge branch 'scaladoc1' of github.com:rxin/incubator-spark into graphx 2014-01-13 16:25:21 -08:00
Reynold Xin 01c0d72b32 Merge pull request #410 from rxin/scaladoc1
Updated JavaStreamingContext to make scaladoc compile.

`sbt/sbt doc` used to fail. This fixed it.
2014-01-13 16:24:30 -08:00
Reynold Xin e2d25d2dfe Merge branch 'master' into graphx 2014-01-13 16:21:26 -08:00
Reynold Xin 30328c347b Updated JavaStreamingContext to make scaladoc compile.
`sbt/sbt doc` used to fail. This fixed it.
2014-01-13 15:58:39 -08:00
Ankur Dave 8038da2328 Merge pull request #2 from jegonzal/GraphXCCIssue
Improving documentation and identifying potential bug in CC calculation.
2014-01-13 14:59:30 -08:00
Tathagata Das 27311b1332 Added unpersisting and modified testsuite to better test out metadata cleaning. 2014-01-13 14:57:07 -08:00
Ankur Dave 97cd27e31b Add graph loader links to doc 2014-01-13 14:54:48 -08:00
Ankur Dave 15ca89b11e Fix mapReduceTriplets links in doc 2014-01-13 14:54:33 -08:00
Joseph E. Gonzalez 80e4d98dc6 Improving documentation and identifying potential bug in CC calculation. 2014-01-13 13:40:16 -08:00
Patrick Wendell c3816de504 Changing option wording per discussion with Andrew 2014-01-13 13:25:06 -08:00
Ankur Dave 9fe88627b5 Improve EdgeRDD scaladoc 2014-01-13 13:16:41 -08:00
Ankur Dave ea69cff711 Further improve VertexRDD scaladocs 2014-01-13 12:52:52 -08:00
Patrick Wendell 5d61e051c2 Improvements to external sorting
1. Adds the option of compressing outputs.
2. Adds batching to the serialization to prevent OOM on the read side.
3. Slight renaming of config options.
4. Use Spark's buffer size for reads in addition to writes.
2014-01-13 12:21:39 -08:00
Patrick Wendell b93f9d42f2 Merge pull request #400 from tdas/dstream-move
Moved DStream and PairDSream to org.apache.spark.streaming.dstream

Similar to the package location of `org.apache.spark.rdd.RDD`, `DStream` has been moved from `org.apache.spark.streaming.DStream` to `org.apache.spark.streaming.dstream.DStream`. I know that the package name is a little long, but I think its better to keep it consistent with Spark's structure.

Also fixed persistence of windowed DStream. The RDDs generated generated by windowed DStream are essentially unions of underlying RDDs, and persistent these union RDDs would store numerous copies of the underlying data. Instead setting the persistence level on the windowed DStream is made to set the persistence level of the underlying DStream.
2014-01-13 12:18:05 -08:00
Ankur Dave 8ca9773974 Add LiveJournalPageRank example 2014-01-13 12:17:58 -08:00
Saurabh Rawat e922973373 Modifications as suggested in PR feedback-
- mapPartitions, foreachPartition moved to JavaRDDLike
- call scala rdd's setGenerator instead of setting directly in JavaRDD
2014-01-13 23:40:04 +05:30
eklavya fa42951e3b Remove default param from mapPartitions 2014-01-13 18:13:22 +05:30
eklavya 8fe562c0fa Remove classtag from mapPartitions. 2014-01-13 18:09:58 +05:30
eklavya 6a65feebc7 Added foreachPartition method to JavaRDD. 2014-01-13 17:56:47 +05:30
eklavya dbadc6b994 Added mapPartitions method to JavaRDD. 2014-01-13 17:56:10 +05:30
eklavya aae8a01425 Added setter method setGenerator to JavaRDD. 2014-01-13 17:53:35 +05:30
Andrew Or a1f0992fae Report bytes spilled for both memory and disk on Web UI 2014-01-12 23:42:57 -08:00
Andrew Or 69c9aebed0 Enable external sorting by default 2014-01-12 22:43:01 -08:00
Reynold Xin e6ed13f255 Merge pull request #397 from pwendell/host-port
Remove now un-needed hostPort option

I noticed this was logging some scary error messages in various places. After I looked into it, this is no longer really used. I removed the option and re-wrote the one remaining use case (it was unnecessary there anyways).
2014-01-12 22:35:14 -08:00
Andrew Or 8d40e7222f Get rid of spill map in SparkEnv 2014-01-12 22:34:33 -08:00
Tathagata Das ffa1d38ef1 Fixed import formatting. 2014-01-12 22:27:07 -08:00
Joseph E. Gonzalez 66c9d0092a Tested and corrected all examples up to mask in the graphx-programming-guide. 2014-01-12 22:11:13 -08:00
Ankur Dave 1efe78a101 Use GraphLoader for algorithms examples in doc 2014-01-12 22:03:03 -08:00
Tathagata Das 777c181d2f Merge remote-tracking branch 'apache/master' into dstream-move
Conflicts:
	streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
2014-01-12 21:59:51 -08:00