Commit graph

286 commits

Author SHA1 Message Date
Joseph E. Gonzalez 00e73833cc Fixing a bug in reverse edge direction. 2013-10-26 15:10:30 -07:00
Kyle Ellrott 8236d5dcc4 More changes to the graph/pom.xml to make it match the other subprojects 2013-10-25 15:52:44 -07:00
Kyle Ellrott d39ac2eb40 Merge https://github.com/amplab/graphx 2013-10-25 13:16:05 -07:00
Kyle Ellrott 59ec6b85d0 Merge branch 'master' of https://github.com/amplab/graphx 2013-10-24 10:29:24 -07:00
Joseph E. Gonzalez c30624dcbb Adding dynamic pregel, fixing bugs in PageRank, and adding basic analytics unit tests. 2013-10-23 00:25:45 -07:00
Joseph E. Gonzalez 0bd92ed8d0 Fixing a bug in pregel where the initial vertex-program results are lost. 2013-10-22 19:10:51 -07:00
Joseph E. Gonzalez be8269af07 Merge branch 'VertexSetRDD_Tests' into AnalyticsCleanup 2013-10-22 15:03:49 -07:00
Joseph E. Gonzalez e3eb03d5b5 Starting analytics test suite. 2013-10-22 15:03:16 -07:00
Joseph E. Gonzalez ba5c75692a Updating analytics to reflect changes in the pregel interface and moving degree information into the edge attribute. 2013-10-22 15:03:00 -07:00
Joseph E. Gonzalez 46b195253e Adding some additional graph generators to support unit testing of the analytics package. 2013-10-22 15:01:49 -07:00
Joseph E. Gonzalez 14a3329a11 Changing the Pregel interface slightly to better support type inference. 2013-10-22 15:01:20 -07:00
Kyle Ellrott 73bf8587e2 Fixing graph/pom.xml 2013-10-21 15:13:31 -07:00
Joseph E. Gonzalez ebdbedc3e9 Documenting VertexSetRDD and added some testing code for VertexSetRDD 2013-10-19 01:26:08 -07:00
Joseph E. Gonzalez dbc8c9868a Fixing bug in VertexSetRDD that breaks Graph tests. 2013-10-18 23:44:06 -07:00
Reynold Xin 9cf43cfeb7 Merge pull request #28 from jegonzal/VertexSetRDD
Refactoring IndexedRDD to VertexSetRDD.
2013-10-18 22:07:21 -07:00
Ankur Dave 2d3603930e Add a unit test for GraphOps.joinVertices 2013-10-18 19:46:13 -07:00
Ankur Dave d15db10831 Add a unit test for Graph.mapEdges 2013-10-18 19:46:13 -07:00
Ankur Dave d429f015c0 Update GraphSuite aggregateNeighbors test 2013-10-18 19:46:13 -07:00
Joseph E. Gonzalez 5d01ebca3c Specializing IndexedRDD as VertexSetRDD.
1) This allows the index map to be optimized for Vids
2) This makes the code more readable
2) The Graph API can now return VertexSetRDDs from operations that produce results for vertices
2013-10-18 19:03:59 -07:00
Joseph E. Gonzalez bb58aa5330 Added some stub code to address the case where a vertex could occur multiple times in the vertex table or where a vertex in the edge list may not appear in the vertex table.
Moving IndexedRDD into the graphx source tree and removing dependencies in /core.
2013-10-18 18:15:32 -07:00
Ankur Dave 36a902e52d Revert accidental removal of code in 3a40a5e 2013-10-18 16:19:40 -07:00
Dan Crankshaw 3a40a5eb30 Added some documentation. 2013-10-18 15:11:21 -07:00
Joseph E. Gonzalez 3f3d28c73f Switching from Seq to IndexedSeq 2013-10-17 19:55:36 -07:00
Joseph E. Gonzalez 9a03c5fe28 This commit accomplishes three goals:
1) Further simplification of the IndexedRDD operations (eliminating some)
 2) Aggressive reuse of HashMaps
 3) Pipelining join operations within indexedrdd
2013-10-17 19:01:48 -07:00
Ankur Dave bf19aac2b7 Use ArrayBuilder instead of ArrayBuffer
ArrayBuilder is specialized for holding primitive VD types.
2013-10-17 13:19:00 -07:00
Ankur Dave 2282d27cf1 Cache msgsByPartition 2013-10-16 23:56:15 -07:00
Ankur Dave bc234bf0e1 Split vTableReplicated into two RDDs
Previously, (vTableReplicated: IndexedRDD[Pid, VertexHashMap[VD]])
stored one hashmap per partition, taking Vid directly to VD.

To take advantage of rxin's new hashmaps (see
rxin/incubator-spark@32a79d6d13), this
commit splits that data structure into two RDDs:

(vTableReplicationMap: IndexedRDD[Pid, VertexIdToIndexMap]) stores a map
per partition from vertex ID to the index where that vertex's attribute
is stored. This index refers to an array in the same partition in
vTableReplicatedValues.

(vTableReplicatedValues: IndexedRDD[Pid, Array[VD]]) stores the vertex
data and is arranged as described above.
2013-10-16 19:22:23 -07:00
Ankur Dave af8e461841 Set serialization properties in GraphSuite 2013-10-16 19:21:24 -07:00
Joseph E. Gonzalez 57ac9073ae Introducing unique indexedrdd and adding numerous specialized joins 2013-10-16 04:08:22 -07:00
Joseph E. Gonzalez 59700c0c2a switched to more efficienct implementation of reduce by key 2013-10-16 00:18:37 -07:00
Joseph E. Gonzalez 9058f261fe Addressing issue where statistics are not computed correctly 2013-10-15 17:39:09 -07:00
Joseph E. Gonzalez 194bb03d16 Resolved closure capture issues by addressing capture through implicit variables. 2013-10-15 15:10:41 -07:00
Joseph E. Gonzalez 7241cf1632 Updating unit tests. 2013-10-15 14:18:03 -07:00
Joseph E. Gonzalez 345e1e94cc Still trying to resolve issues with capture. 2013-10-15 14:01:38 -07:00
Joseph E. Gonzalez b64337ec40 Trying to resolve issues with closure capture. 2013-10-15 13:02:17 -07:00
Joseph E. Gonzalez e7d0320000 More refactoring and documentating including renaming data to attr for vertex and edge data and eliminating the vertex type. 2013-10-15 02:20:06 -07:00
Joseph E. Gonzalez 67bb39c54b Removing extraneous code 2013-10-14 18:49:05 -07:00
Joseph E. Gonzalez bff223454a trying to address issues with GraphImpl being caught in closures. 2013-10-13 22:27:10 -07:00
Joseph E. Gonzalez 637b67da56 merging changes from upstream benchmarking branch 2013-10-13 19:54:09 -07:00
Joseph E. Gonzalez 494472a6cc Integrated IndexedRDD into graph design. 2013-10-13 19:42:32 -07:00
Dan Crankshaw 1a961dd1f2 Fixed connected components CL params 2013-10-12 01:47:38 +00:00
Dan Crankshaw 1e5535cfcf Added connected components back 2013-10-11 16:38:52 -07:00
Dan Crankshaw 543a54dffa Tried to fix some indenting 2013-10-11 16:07:49 -07:00
Dan Crankshaw c4a23f95c3 Updated code so benchmarks actually run. 2013-10-11 22:57:43 +00:00
Joseph E. Gonzalez fa2f87ca63 added replication and balance reporting 2013-10-10 14:48:40 -07:00
Joseph E. Gonzalez 5f756fb63f added support for random vertex cuts 2013-10-10 14:10:47 -07:00
Joseph E. Gonzalez 8dfac4ea8f added support for random vertex cuts 2013-10-10 14:09:01 -07:00
Dan Crankshaw 9929e7b9a5 Merge branch 'benchmarks' of github.com:amplab/graphx 2013-10-10 13:36:51 -07:00
Dan Crankshaw 4b46d519db Merge pull request #17 from amplab/product2
product 2 change
2013-10-10 13:35:36 -07:00
Reynold Xin 5218e46178 Updated Kryo registration. 2013-10-07 11:48:50 -07:00
Reynold Xin 4f916f5302 Created a MessageToPartition class to send messages without saving the partition id. 2013-10-07 11:31:00 -07:00
Dan Crankshaw 2a8f3db94d Fixed groupEdgeTriplets - it now passes a basic unit test.
The problem was with the way the EdgeTripletRDD iterator worked. Calling
toList on it returned the last value repeatedly. Fixed by overriding
toList in the iterator.
2013-10-06 19:52:40 -07:00
Dan Crankshaw 0d3ea36fd8 Added a groupEdges and a groupEdgeTriplets method. For some reason the groupEdgeTriplets method isn't properly iterating through the set of edges and thus is returning the wrong result. groupEdges seems to be working. 2013-10-06 18:34:23 -07:00
Dan Crankshaw 6cb21ce889 groupEdges() now compiles. Still need some unit tests 2013-10-06 15:33:35 -07:00
Dan Crankshaw 730a3156d3 Added initial groupEdges code. Still a prototype, I haven't figured out quite how it should all work yet. 2013-10-05 19:44:28 -07:00
Dan Crankshaw bfedbee13a Edge partitioner now partitions by canonical edge so all edges between two vertices (in either direction) will be sent to same machine. 2013-10-05 16:04:57 -07:00
Dan Crankshaw e096cbe90e Added 2D canonical edge partitioner 2013-10-05 15:20:15 -07:00
Dan Crankshaw da3e123afb Removed some comments 2013-10-03 18:11:35 -07:00
Dan Crankshaw 1ee60d3b34 Fixed bug in sampleLogNormal 2013-10-03 17:46:37 -07:00
Dan Crankshaw 27b442dc06 Fixed annotation import 2013-10-03 10:29:00 -07:00
Dan Crankshaw 8edd499eff Added rmat graph generator 2013-10-03 10:21:34 -07:00
Dan Crankshaw 3c3cc1508b Added initial implementation of lognormal graph generator. Haven't tested it yet. 2013-09-28 16:00:44 -07:00
Ankur Dave bf05dc7e78 Add a unit test for aggregateNeighbors 2013-09-19 23:45:15 -07:00
Ankur Dave 7cadeffdf4 Merge branch 'master' into aggregateNeighbors-returns-graph 2013-09-19 23:14:26 -07:00
Ankur Dave f08e520f4c Initialize sc in GraphSuite to avoid NullPointerException 2013-09-19 23:12:24 -07:00
Ankur Dave f02d5c8c53 Fix typo in aggregateNeighbors docs 2013-09-19 23:06:37 -07:00
Ankur Dave d3cbde0085 Import appropriate Spark core classes 2013-09-19 19:29:58 -07:00
Ankur Dave c278907bf0 Move BytecodeUtils to the right package 2013-09-19 19:28:22 -07:00
Ankur Dave 4c694bd705 Move IndexedRDD and GraphSuite to org.apache.spark 2013-09-19 19:13:07 -07:00
Ankur Dave 4e967af6af Return Graph from default aggregateNeighbors also 2013-09-18 16:18:33 -07:00
Ankur Dave b04f1a4019 Implement aggregateNeighbors returning Graph 2013-09-18 16:18:33 -07:00
Ankur Dave 9ff783599b Return Graph from aggregateNeighbors; update callers
This commit only affects the Graph API, not GraphImpl.
2013-09-18 16:18:33 -07:00
Joseph E. Gonzalez 55696e2584 GraphX now builds with all merged changes. 2013-09-17 22:42:12 -07:00
Joseph E. Gonzalez 5ccb60d467 Working on graph test suite 2013-08-11 14:49:22 -07:00
Joseph E. Gonzalez ddf126edad added subgraph 2013-08-06 17:48:04 -07:00
Joseph E. Gonzalez b454314e07 Added 2d partitioning 2013-08-06 15:14:13 -07:00
Joseph E. Gonzalez 7ae83f6ef4 Switching to Long vids instead of integers. This required a surprising number of changes since the fastutil library function names include the type (e.g., getLong() instead of just get()) 2013-08-06 14:05:54 -07:00
Joseph E. Gonzalez 413b0c1526 merged with upstream 2013-08-06 12:09:03 -07:00
Joseph E. Gonzalez 42942fc1a9 In the process of bringing the GraphLab api back and fixing the analytics toolkit 2013-08-06 11:59:31 -07:00
Reynold Xin 2f2c7e6a29 Added a correctEdges function. 2013-07-01 16:23:23 -07:00
Reynold Xin 2943edf8ee Fixed another bug .. 2013-07-01 00:24:30 -07:00
Reynold Xin 0791581346 More bug fixes 2013-06-30 23:07:40 -07:00
Joseph E. Gonzalez c90967a6a2 Merge branch 'graph' of https://github.com/rxin/spark into graph 2013-06-29 21:53:44 -07:00
Joseph E. Gonzalez f776301241 Resurrecting the GraphLab gather-apply-scatter api 2013-06-29 21:53:38 -07:00
Joseph E. Gonzalez f269e5975b Adding additional assertions and documenting the edge triplet class 2013-06-29 21:53:13 -07:00
Reynold Xin 438a213695 Merge branch 'graph' of github.com:rxin/spark into graph
Conflicts:
	graph/src/test/scala/spark/graph/GraphSuite.scala
2013-06-29 21:29:17 -07:00
Reynold Xin 6acc2a7b3d Various minor changes. 2013-06-29 21:28:31 -07:00
Joseph E. Gonzalez f24548da88 Adding graph cosntruction code to graph singleton object. 2013-06-29 21:04:53 -07:00
Joseph E. Gonzalez 2964df7a4e Commenting out unused test code. 2013-06-29 21:04:35 -07:00
Reynold Xin 79b5eaa4e2 Added a 64bit string hash function. 2013-06-29 15:58:59 -07:00
Reynold Xin 758ceff778 Updated BytecodeUtils to ASM4. 2013-06-29 15:39:48 -07:00
Reynold Xin ae0eca5ec8 Merge branch 'graph' of github.com:rxin/spark into graph 2013-06-29 15:22:33 -07:00
Reynold Xin ae12d163dc Added the BytecodeUtils class for analyzing bytecode. 2013-06-29 15:22:15 -07:00
Joseph E. Gonzalez a80b28a579 Renamed several functions and classes and improved documentation 2013-06-28 18:18:51 -07:00
Joseph E. Gonzalez 0c24305b8d added documentation to graph and did some minor renaming 2013-05-09 20:14:27 -07:00
Reynold Xin 0711fab137 Merge branch 'graph' of github.com:rxin/spark into graph
Conflicts:
	graph/src/main/scala/spark/graph/Analytics.scala
2013-05-05 17:46:54 -07:00
Reynold Xin 70ba4d1740 Refactored the Graph API for discussion. 2013-05-05 17:12:30 -07:00
Joseph E. Gonzalez a8dad98c55 merged with trunk 2013-04-16 11:39:02 -07:00
Joseph E. Gonzalez 2635416cee switching from floats to doubles in pagerank and sssp 2013-04-16 11:27:56 -07:00
Reynold Xin 583a5858e7 Added more logging to Analytics. 2013-04-07 15:45:28 +08:00
Reynold Xin 3728e1bc40 Code to run bagel vs graph experiments. 2013-04-07 15:05:46 +08:00
Reynold Xin 9eec317835 Minor cleanup. 2013-04-05 23:22:55 +08:00
Reynold Xin 822d9c5b70 Rename rawGraph to graph in Pregel. 2013-04-05 16:16:04 +08:00
Reynold Xin d40c1d5122 Added unit test and fix a partitioner problem. 2013-04-05 15:33:50 +08:00
Joseph E. Gonzalez 14205548ce Merged with master
Merge branch 'graph' of https://github.com/rxin/spark into graph
2013-04-04 23:51:06 -07:00
Joseph E. Gonzalez 092708e57e better parsing of graph text files 2013-04-04 23:50:53 -07:00
Reynold Xin e5dd61e720 Rename rawGraph to graph. 2013-04-05 14:47:39 +08:00
Reynold Xin c973e564b9 Changed EdgeDirection from Enumeration to case classes. 2013-04-05 14:46:37 +08:00
Reynold Xin 34ef3e52dc Merge branch 'graph' of github.com:rxin/spark into graph 2013-04-05 14:21:44 +08:00
Reynold Xin 9764e579b8 Minor cleanup. 2013-04-05 14:21:24 +08:00
Joseph E. Gonzalez 3d7a4b1fef More realistic version of Pregel. 2013-04-04 23:17:27 -07:00
Reynold Xin 7134856351 numVertexPartitions and numEdgePartitions are now part of the
constructor and are immutable.

Also done some cleanups.
2013-04-05 12:54:59 +08:00
Joseph E. Gonzalez d5b0f4dfa6 added a function to collect the neighborhood of a vertex as well as the skeleton of the program for classic pregel 2013-04-04 19:54:14 -07:00
Joseph E. Gonzalez ad908f7545 added pregel pagerank 2013-04-04 19:31:14 -07:00
Joseph E. Gonzalez d510045d8c fixing the the silly bug in the dynamic pagerank gather function 2013-04-04 19:24:06 -07:00
Joseph E. Gonzalez b537613570 added dynamic graphlab 2013-04-04 19:18:15 -07:00
Joseph E. Gonzalez 4a2b8aa557 added dynamic graphlab but I have not yet tested it. 2013-04-04 18:59:56 -07:00
Joseph E. Gonzalez 0667986c8e Minor tweak to pregel semantics to give reverse edge illusion on send message. 2013-04-04 17:59:08 -07:00
Joseph E. Gonzalez cb99fc193c Added graphlab style implementation of Pregel and a more sophisticated version of mapReduceNeighborhoodFilter 2013-04-04 17:56:16 -07:00
Joseph E. Gonzalez db45cf3a49 Fixing several bugs in mapReduceNeighborhood. First a map is used instead of a foreach which for mysterious reasons meant that the map never seems to be executed? Switching to a foreach causes a null pointer exception since the body of the foreach did not properly initialize the temporary EdgeWithVertex data structure. 2013-04-04 15:37:02 -07:00
Joseph E. Gonzalez 93eca18a62 fixing a silly bug whereby the pagerank equation was implemented incorrectly (divided by degree of dst instead of source). 2013-04-04 15:35:07 -07:00
Reynold Xin 1671abf47c Implemented mapReduceNeighborhood in Graph and used that to implement
gather in GraphLab.
2013-04-05 01:47:59 +08:00
Joseph E. Gonzalez 28d0557fd8 added graph generator to run additional experiments 2013-04-03 23:23:07 -07:00
Joseph E. Gonzalez 91e1227edb completed port of several analytics packages as well as analytics main 2013-04-03 23:04:29 -07:00
Joseph E. Gonzalez 39cac0ae65 Fixed iterateGAS to return a graph, added some minor features to graph and some additional todo items, and added the Analytics code from the internal SparkGraph prototype 2013-04-03 16:22:37 -07:00
Joseph E. Gonzalez ad73e5bdbb Merging with downstream changes.
Merge branch 'graph' of https://github.com/rxin/spark into graph
2013-04-03 09:10:20 -07:00
Reynold Xin 4291da1481 Allow returning different vertex data type in updateVertices.
Please enter the commit message for your changes. Lines starting
2013-04-04 00:07:54 +08:00
Joseph E. Gonzalez c649073b5f merged with trunk 2013-04-03 08:47:49 -07:00
Joseph E. Gonzalez 0123c9d6a1 Changed GraphLab class to object and added graph loading from text file. 2013-04-03 08:43:08 -07:00
Reynold Xin fe42ad41bb Commit a working version. 2013-04-03 23:40:09 +08:00
Reynold Xin cb0efe92d1 Oh wow it finally compiles! 2013-04-03 23:12:01 +08:00
Reynold Xin d63c895945 Partial checkin of graphlab module. 2013-04-03 00:42:33 +08:00
Reynold Xin 25c71b185d Added a vertices method to Graph. 2013-04-02 01:26:20 +08:00
Reynold Xin d7011b0f78 Added a Graph class that supports joining vertices with edges. 2013-04-02 01:17:44 +08:00
Reynold Xin 28ebe04496 Added a Graph class that supports joining vertices with edges. 2013-04-02 01:16:08 +08:00
Reynold Xin 81c4d19c61 Maven and sbt build changes for SparkGraph. 2013-02-19 12:43:13 -08:00