Joseph E. Gonzalez
5d01ebca3c
Specializing IndexedRDD as VertexSetRDD.
...
1) This allows the index map to be optimized for Vids
2) This makes the code more readable
2) The Graph API can now return VertexSetRDDs from operations that produce results for vertices
2013-10-18 19:03:59 -07:00
Joseph E. Gonzalez
bb58aa5330
Added some stub code to address the case where a vertex could occur multiple times in the vertex table or where a vertex in the edge list may not appear in the vertex table.
...
Moving IndexedRDD into the graphx source tree and removing dependencies in /core.
2013-10-18 18:15:32 -07:00
Ankur Dave
36a902e52d
Revert accidental removal of code in 3a40a5e
2013-10-18 16:19:40 -07:00
Dan Crankshaw
3a40a5eb30
Added some documentation.
2013-10-18 15:11:21 -07:00
Joseph E. Gonzalez
3f3d28c73f
Switching from Seq to IndexedSeq
2013-10-17 19:55:36 -07:00
Joseph E. Gonzalez
9a03c5fe28
This commit accomplishes three goals:
...
1) Further simplification of the IndexedRDD operations (eliminating some)
2) Aggressive reuse of HashMaps
3) Pipelining join operations within indexedrdd
2013-10-17 19:01:48 -07:00
Ankur Dave
bf19aac2b7
Use ArrayBuilder instead of ArrayBuffer
...
ArrayBuilder is specialized for holding primitive VD types.
2013-10-17 13:19:00 -07:00
Ankur Dave
2282d27cf1
Cache msgsByPartition
2013-10-16 23:56:15 -07:00
Ankur Dave
bc234bf0e1
Split vTableReplicated into two RDDs
...
Previously, (vTableReplicated: IndexedRDD[Pid, VertexHashMap[VD]])
stored one hashmap per partition, taking Vid directly to VD.
To take advantage of rxin's new hashmaps (see
rxin/incubator-spark@32a79d6d13 ), this
commit splits that data structure into two RDDs:
(vTableReplicationMap: IndexedRDD[Pid, VertexIdToIndexMap]) stores a map
per partition from vertex ID to the index where that vertex's attribute
is stored. This index refers to an array in the same partition in
vTableReplicatedValues.
(vTableReplicatedValues: IndexedRDD[Pid, Array[VD]]) stores the vertex
data and is arranged as described above.
2013-10-16 19:22:23 -07:00
Ankur Dave
af8e461841
Set serialization properties in GraphSuite
2013-10-16 19:21:24 -07:00
Joseph E. Gonzalez
57ac9073ae
Introducing unique indexedrdd and adding numerous specialized joins
2013-10-16 04:08:22 -07:00
Joseph E. Gonzalez
59700c0c2a
switched to more efficienct implementation of reduce by key
2013-10-16 00:18:37 -07:00
Joseph E. Gonzalez
9058f261fe
Addressing issue where statistics are not computed correctly
2013-10-15 17:39:09 -07:00
Joseph E. Gonzalez
194bb03d16
Resolved closure capture issues by addressing capture through implicit variables.
2013-10-15 15:10:41 -07:00
Joseph E. Gonzalez
7241cf1632
Updating unit tests.
2013-10-15 14:18:03 -07:00
Joseph E. Gonzalez
345e1e94cc
Still trying to resolve issues with capture.
2013-10-15 14:01:38 -07:00
Joseph E. Gonzalez
b64337ec40
Trying to resolve issues with closure capture.
2013-10-15 13:02:17 -07:00
Joseph E. Gonzalez
e7d0320000
More refactoring and documentating including renaming data to attr for vertex and edge data and eliminating the vertex type.
2013-10-15 02:20:06 -07:00
Joseph E. Gonzalez
67bb39c54b
Removing extraneous code
2013-10-14 18:49:05 -07:00
Joseph E. Gonzalez
bff223454a
trying to address issues with GraphImpl being caught in closures.
2013-10-13 22:27:10 -07:00
Joseph E. Gonzalez
637b67da56
merging changes from upstream benchmarking branch
2013-10-13 19:54:09 -07:00
Joseph E. Gonzalez
494472a6cc
Integrated IndexedRDD into graph design.
2013-10-13 19:42:32 -07:00
Dan Crankshaw
1a961dd1f2
Fixed connected components CL params
2013-10-12 01:47:38 +00:00
Dan Crankshaw
1e5535cfcf
Added connected components back
2013-10-11 16:38:52 -07:00
Dan Crankshaw
543a54dffa
Tried to fix some indenting
2013-10-11 16:07:49 -07:00
Dan Crankshaw
c4a23f95c3
Updated code so benchmarks actually run.
2013-10-11 22:57:43 +00:00
Joseph E. Gonzalez
fa2f87ca63
added replication and balance reporting
2013-10-10 14:48:40 -07:00
Joseph E. Gonzalez
5f756fb63f
added support for random vertex cuts
2013-10-10 14:10:47 -07:00
Joseph E. Gonzalez
8dfac4ea8f
added support for random vertex cuts
2013-10-10 14:09:01 -07:00
Dan Crankshaw
9929e7b9a5
Merge branch 'benchmarks' of github.com:amplab/graphx
2013-10-10 13:36:51 -07:00
Dan Crankshaw
4b46d519db
Merge pull request #17 from amplab/product2
...
product 2 change
2013-10-10 13:35:36 -07:00
Reynold Xin
5218e46178
Updated Kryo registration.
2013-10-07 11:48:50 -07:00
Reynold Xin
4f916f5302
Created a MessageToPartition class to send messages without saving the partition id.
2013-10-07 11:31:00 -07:00
Dan Crankshaw
2a8f3db94d
Fixed groupEdgeTriplets - it now passes a basic unit test.
...
The problem was with the way the EdgeTripletRDD iterator worked. Calling
toList on it returned the last value repeatedly. Fixed by overriding
toList in the iterator.
2013-10-06 19:52:40 -07:00
Dan Crankshaw
0d3ea36fd8
Added a groupEdges and a groupEdgeTriplets method. For some reason the groupEdgeTriplets method isn't properly iterating through the set of edges and thus is returning the wrong result. groupEdges seems to be working.
2013-10-06 18:34:23 -07:00
Dan Crankshaw
6cb21ce889
groupEdges() now compiles. Still need some unit tests
2013-10-06 15:33:35 -07:00
Dan Crankshaw
730a3156d3
Added initial groupEdges code. Still a prototype, I haven't figured out quite how it should all work yet.
2013-10-05 19:44:28 -07:00
Dan Crankshaw
bfedbee13a
Edge partitioner now partitions by canonical edge so all edges between two vertices (in either direction) will be sent to same machine.
2013-10-05 16:04:57 -07:00
Dan Crankshaw
e096cbe90e
Added 2D canonical edge partitioner
2013-10-05 15:20:15 -07:00
Dan Crankshaw
da3e123afb
Removed some comments
2013-10-03 18:11:35 -07:00
Dan Crankshaw
1ee60d3b34
Fixed bug in sampleLogNormal
2013-10-03 17:46:37 -07:00
Dan Crankshaw
27b442dc06
Fixed annotation import
2013-10-03 10:29:00 -07:00
Dan Crankshaw
8edd499eff
Added rmat graph generator
2013-10-03 10:21:34 -07:00
Dan Crankshaw
3c3cc1508b
Added initial implementation of lognormal graph generator. Haven't tested it yet.
2013-09-28 16:00:44 -07:00
Ankur Dave
bf05dc7e78
Add a unit test for aggregateNeighbors
2013-09-19 23:45:15 -07:00
Ankur Dave
7cadeffdf4
Merge branch 'master' into aggregateNeighbors-returns-graph
2013-09-19 23:14:26 -07:00
Ankur Dave
f08e520f4c
Initialize sc in GraphSuite to avoid NullPointerException
2013-09-19 23:12:24 -07:00
Ankur Dave
f02d5c8c53
Fix typo in aggregateNeighbors docs
2013-09-19 23:06:37 -07:00
Ankur Dave
d3cbde0085
Import appropriate Spark core classes
2013-09-19 19:29:58 -07:00
Ankur Dave
c278907bf0
Move BytecodeUtils to the right package
2013-09-19 19:28:22 -07:00
Ankur Dave
4c694bd705
Move IndexedRDD and GraphSuite to org.apache.spark
2013-09-19 19:13:07 -07:00
Ankur Dave
4e967af6af
Return Graph from default aggregateNeighbors also
2013-09-18 16:18:33 -07:00
Ankur Dave
b04f1a4019
Implement aggregateNeighbors returning Graph
2013-09-18 16:18:33 -07:00
Ankur Dave
9ff783599b
Return Graph from aggregateNeighbors; update callers
...
This commit only affects the Graph API, not GraphImpl.
2013-09-18 16:18:33 -07:00
Joseph E. Gonzalez
55696e2584
GraphX now builds with all merged changes.
2013-09-17 22:42:12 -07:00
Joseph E. Gonzalez
5ccb60d467
Working on graph test suite
2013-08-11 14:49:22 -07:00
Joseph E. Gonzalez
ddf126edad
added subgraph
2013-08-06 17:48:04 -07:00
Joseph E. Gonzalez
b454314e07
Added 2d partitioning
2013-08-06 15:14:13 -07:00
Joseph E. Gonzalez
7ae83f6ef4
Switching to Long vids instead of integers. This required a surprising number of changes since the fastutil library function names include the type (e.g., getLong() instead of just get())
2013-08-06 14:05:54 -07:00
Joseph E. Gonzalez
413b0c1526
merged with upstream
2013-08-06 12:09:03 -07:00
Joseph E. Gonzalez
42942fc1a9
In the process of bringing the GraphLab api back and fixing the analytics toolkit
2013-08-06 11:59:31 -07:00
Reynold Xin
2f2c7e6a29
Added a correctEdges function.
2013-07-01 16:23:23 -07:00
Reynold Xin
2943edf8ee
Fixed another bug ..
2013-07-01 00:24:30 -07:00
Reynold Xin
0791581346
More bug fixes
2013-06-30 23:07:40 -07:00
Joseph E. Gonzalez
c90967a6a2
Merge branch 'graph' of https://github.com/rxin/spark into graph
2013-06-29 21:53:44 -07:00
Joseph E. Gonzalez
f776301241
Resurrecting the GraphLab gather-apply-scatter api
2013-06-29 21:53:38 -07:00
Joseph E. Gonzalez
f269e5975b
Adding additional assertions and documenting the edge triplet class
2013-06-29 21:53:13 -07:00
Reynold Xin
438a213695
Merge branch 'graph' of github.com:rxin/spark into graph
...
Conflicts:
graph/src/test/scala/spark/graph/GraphSuite.scala
2013-06-29 21:29:17 -07:00
Reynold Xin
6acc2a7b3d
Various minor changes.
2013-06-29 21:28:31 -07:00
Joseph E. Gonzalez
f24548da88
Adding graph cosntruction code to graph singleton object.
2013-06-29 21:04:53 -07:00
Joseph E. Gonzalez
2964df7a4e
Commenting out unused test code.
2013-06-29 21:04:35 -07:00
Reynold Xin
79b5eaa4e2
Added a 64bit string hash function.
2013-06-29 15:58:59 -07:00
Reynold Xin
758ceff778
Updated BytecodeUtils to ASM4.
2013-06-29 15:39:48 -07:00
Reynold Xin
ae0eca5ec8
Merge branch 'graph' of github.com:rxin/spark into graph
2013-06-29 15:22:33 -07:00
Reynold Xin
ae12d163dc
Added the BytecodeUtils class for analyzing bytecode.
2013-06-29 15:22:15 -07:00
Joseph E. Gonzalez
a80b28a579
Renamed several functions and classes and improved documentation
2013-06-28 18:18:51 -07:00
Joseph E. Gonzalez
0c24305b8d
added documentation to graph and did some minor renaming
2013-05-09 20:14:27 -07:00
Reynold Xin
0711fab137
Merge branch 'graph' of github.com:rxin/spark into graph
...
Conflicts:
graph/src/main/scala/spark/graph/Analytics.scala
2013-05-05 17:46:54 -07:00
Reynold Xin
70ba4d1740
Refactored the Graph API for discussion.
2013-05-05 17:12:30 -07:00
Joseph E. Gonzalez
a8dad98c55
merged with trunk
2013-04-16 11:39:02 -07:00
Joseph E. Gonzalez
2635416cee
switching from floats to doubles in pagerank and sssp
2013-04-16 11:27:56 -07:00
Reynold Xin
583a5858e7
Added more logging to Analytics.
2013-04-07 15:45:28 +08:00
Reynold Xin
3728e1bc40
Code to run bagel vs graph experiments.
2013-04-07 15:05:46 +08:00
Reynold Xin
9eec317835
Minor cleanup.
2013-04-05 23:22:55 +08:00
Reynold Xin
822d9c5b70
Rename rawGraph to graph in Pregel.
2013-04-05 16:16:04 +08:00
Reynold Xin
d40c1d5122
Added unit test and fix a partitioner problem.
2013-04-05 15:33:50 +08:00
Joseph E. Gonzalez
14205548ce
Merged with master
...
Merge branch 'graph' of https://github.com/rxin/spark into graph
2013-04-04 23:51:06 -07:00
Joseph E. Gonzalez
092708e57e
better parsing of graph text files
2013-04-04 23:50:53 -07:00
Reynold Xin
e5dd61e720
Rename rawGraph to graph.
2013-04-05 14:47:39 +08:00
Reynold Xin
c973e564b9
Changed EdgeDirection from Enumeration to case classes.
2013-04-05 14:46:37 +08:00
Reynold Xin
34ef3e52dc
Merge branch 'graph' of github.com:rxin/spark into graph
2013-04-05 14:21:44 +08:00
Reynold Xin
9764e579b8
Minor cleanup.
2013-04-05 14:21:24 +08:00
Joseph E. Gonzalez
3d7a4b1fef
More realistic version of Pregel.
2013-04-04 23:17:27 -07:00
Reynold Xin
7134856351
numVertexPartitions and numEdgePartitions are now part of the
...
constructor and are immutable.
Also done some cleanups.
2013-04-05 12:54:59 +08:00
Joseph E. Gonzalez
d5b0f4dfa6
added a function to collect the neighborhood of a vertex as well as the skeleton of the program for classic pregel
2013-04-04 19:54:14 -07:00
Joseph E. Gonzalez
ad908f7545
added pregel pagerank
2013-04-04 19:31:14 -07:00
Joseph E. Gonzalez
d510045d8c
fixing the the silly bug in the dynamic pagerank gather function
2013-04-04 19:24:06 -07:00
Joseph E. Gonzalez
b537613570
added dynamic graphlab
2013-04-04 19:18:15 -07:00
Joseph E. Gonzalez
4a2b8aa557
added dynamic graphlab but I have not yet tested it.
2013-04-04 18:59:56 -07:00
Joseph E. Gonzalez
0667986c8e
Minor tweak to pregel semantics to give reverse edge illusion on send message.
2013-04-04 17:59:08 -07:00
Joseph E. Gonzalez
cb99fc193c
Added graphlab style implementation of Pregel and a more sophisticated version of mapReduceNeighborhoodFilter
2013-04-04 17:56:16 -07:00
Joseph E. Gonzalez
db45cf3a49
Fixing several bugs in mapReduceNeighborhood. First a map is used instead of a foreach which for mysterious reasons meant that the map never seems to be executed? Switching to a foreach causes a null pointer exception since the body of the foreach did not properly initialize the temporary EdgeWithVertex data structure.
2013-04-04 15:37:02 -07:00
Joseph E. Gonzalez
93eca18a62
fixing a silly bug whereby the pagerank equation was implemented incorrectly (divided by degree of dst instead of source).
2013-04-04 15:35:07 -07:00
Reynold Xin
1671abf47c
Implemented mapReduceNeighborhood in Graph and used that to implement
...
gather in GraphLab.
2013-04-05 01:47:59 +08:00
Joseph E. Gonzalez
28d0557fd8
added graph generator to run additional experiments
2013-04-03 23:23:07 -07:00
Joseph E. Gonzalez
91e1227edb
completed port of several analytics packages as well as analytics main
2013-04-03 23:04:29 -07:00
Joseph E. Gonzalez
39cac0ae65
Fixed iterateGAS to return a graph, added some minor features to graph and some additional todo items, and added the Analytics code from the internal SparkGraph prototype
2013-04-03 16:22:37 -07:00
Joseph E. Gonzalez
ad73e5bdbb
Merging with downstream changes.
...
Merge branch 'graph' of https://github.com/rxin/spark into graph
2013-04-03 09:10:20 -07:00
Reynold Xin
4291da1481
Allow returning different vertex data type in updateVertices.
...
Please enter the commit message for your changes. Lines starting
2013-04-04 00:07:54 +08:00
Joseph E. Gonzalez
c649073b5f
merged with trunk
2013-04-03 08:47:49 -07:00
Joseph E. Gonzalez
0123c9d6a1
Changed GraphLab class to object and added graph loading from text file.
2013-04-03 08:43:08 -07:00
Reynold Xin
fe42ad41bb
Commit a working version.
2013-04-03 23:40:09 +08:00
Reynold Xin
cb0efe92d1
Oh wow it finally compiles!
2013-04-03 23:12:01 +08:00
Reynold Xin
d63c895945
Partial checkin of graphlab module.
2013-04-03 00:42:33 +08:00
Reynold Xin
25c71b185d
Added a vertices method to Graph.
2013-04-02 01:26:20 +08:00
Reynold Xin
d7011b0f78
Added a Graph class that supports joining vertices with edges.
2013-04-02 01:17:44 +08:00
Reynold Xin
28ebe04496
Added a Graph class that supports joining vertices with edges.
2013-04-02 01:16:08 +08:00
Reynold Xin
81c4d19c61
Maven and sbt build changes for SparkGraph.
2013-02-19 12:43:13 -08:00