Commit graph

325 commits

Author SHA1 Message Date
Reynold Xin 2d19d0381b Merge branch 'simplify' into clean 2013-11-26 13:55:26 -08:00
Reynold Xin d58bfa8573 Code cleaning to improve readability. 2013-11-26 13:54:46 -08:00
Dan Crankshaw 4b6b15dadd Actually use partitioner command line args in Analytics. 2013-11-24 16:38:38 -08:00
Ankur Dave fad6e70add Simplify GraphImpl internals 2013-11-23 02:59:56 -08:00
Reynold Xin 18ce7e940b Merge pull request #73 from jegonzal/TriangleCount
Triangle count
2013-11-22 17:02:40 -08:00
Joseph E. Gonzalez de3d6ee5a7 Fixing build after merging upstream changes. 2013-11-19 22:03:49 -08:00
Joseph E. Gonzalez 12cb19b1c1 Adding comments and addressing comments. 2013-11-19 21:37:29 -08:00
Joseph E. Gonzalez ae4ffc319a Setting the initial vertex set size to be small. 2013-11-19 21:36:15 -08:00
Joseph E. Gonzalez 18700b6e74 Switching mapReduceTriplets mapFunction to return iterator instead of array to allow optimizations of the returned object. 2013-11-19 21:36:15 -08:00
Joseph E. Gonzalez 983810ad69 Now with style. Addressing most of Reynolds comments. 2013-11-19 21:35:03 -08:00
Joseph E. Gonzalez 2093a17ff3 Adding triangle count code 2013-11-19 21:35:03 -08:00
Joseph E. Gonzalez 8719ba83c8 Modifying graph loaders to create initial vertex sets more efficiently and load undirected graphs. 2013-11-19 21:35:02 -08:00
Joseph E. Gonzalez 288ae310e7 adding test for collectNeighborIds 2013-11-19 21:03:00 -08:00
Joseph E. Gonzalez 2fc6f5bd47 Switching collectNeighborIds to use mapReduceTriplets directly 2013-11-19 21:03:00 -08:00
Dan Crankshaw 96fafdbd4b Removed sleep from pagerank in Analytics. 2013-11-19 20:39:34 -08:00
Dan Crankshaw 37a524d91c Addressed code review comments. 2013-11-19 16:39:39 -08:00
Dan Crankshaw 5f3ee53751 Added accessVertexAttr func which somehow got lost in a merge. 2013-11-18 19:34:02 -08:00
Dan Crankshaw 8a460e1811 Added partitioner to GraphImpl constructor args. 2013-11-18 19:32:03 -08:00
Dan Crankshaw 1022e9bf17 Fixed code review changes. 2013-11-18 18:08:32 -08:00
Dan Crankshaw 2aaa095687 Merge branch 'master' of github.com:amplab/graphx 2013-11-17 19:35:43 -08:00
Ankur Dave 62a2a71c37 Merge pull request #65 from amplab/varenc
Use variable encoding for ints, longs, and doubles in the specialized serializers.
2013-11-15 13:12:07 -08:00
Ankur Dave 3558e8bda1 During graph creation, create eTable earlier 2013-11-13 17:07:23 -08:00
Joseph E. Gonzalez 5a9b07ead2 Fixing documentation 2013-11-13 10:45:25 -08:00
Joseph E. Gonzalez 266eb01ce8 Addressing issue in Graph creation where a graph created with a vertex set that does not span all of the vertices in the edges will crash on triplet construction. 2013-11-13 10:45:25 -08:00
Reynold Xin 882d069189 Fixed the bug in variable encoding for longs. 2013-11-12 18:50:03 -08:00
Reynold Xin 1e5c17812d Use variable encoding for ints, longs, and doubles in the specialized serializers. 2013-11-12 15:30:27 -08:00
Dan Crankshaw a13460bb64 Updated documentation 2013-11-11 23:42:02 -08:00
Dan Crankshaw 7c573a8b43 Added PartitionStrategy option 2013-11-11 23:42:01 -08:00
Dan Crankshaw 8d8056da14 Fixed issue with canonical edge partitioner. 2013-11-11 23:40:23 -08:00
Dan Crankshaw 4a670ef0ba Merge branch 'master' of github.com:amplab/graphx 2013-11-11 21:42:08 -08:00
Joseph E. Gonzalez 577092080c Cleanning up documentation of VertexSetRDD.scala 2013-11-11 17:29:22 -08:00
Reynold Xin b8e294a21b Merge pull request #61 from ankurdave/pid2vid
Shuffle replicated vertex attributes efficiently in columnar format
2013-11-11 16:25:42 -08:00
Ankur Dave bee1015620 Handle ClassNotFoundException from ByteCodeUtils
ByteCodeUtils.invokedMethod(), which we use in mapReduceTriplets, throws
a ClassNotFoundException when called with a closure defined in the
console. This commit catches the exception and conservatively assumes
the closure references all edge attributes.
2013-11-10 23:00:37 -08:00
Dan Crankshaw 60db25bded Fixed merge conflicts. 2013-11-10 15:45:55 -08:00
Ankur Dave d1ff1b7222 Build pid2vid structures only once, in Vid2Pid 2013-11-10 14:47:39 -08:00
Ankur Dave 502c511711 Use pid2vid for creating VTableReplicatedValues 2013-11-10 14:36:14 -08:00
Ankur Dave 53d24a973e Fix typo 2013-11-10 14:24:38 -08:00
Ankur Dave aa24b0bbe8 Add test for mapReduceTriplets in GraphSuite 2013-11-10 14:24:38 -08:00
Ankur Dave bf4e45e685 Factor out VTableReplicatedValues 2013-11-10 14:24:38 -08:00
Ankur Dave cdbd19bbee Create all versions of vid2pid ahead of time 2013-11-10 14:10:23 -08:00
Ankur Dave 27e4355d61 Test no vertex attribute replication 2013-11-10 14:04:12 -08:00
Ankur Dave 80abc28078 Optimize mrTriplets for source-attr-only mapF using bytecode inspection 2013-11-10 14:04:12 -08:00
Reynold Xin 0e813cd483 Fix the hanging bug. 2013-11-09 23:29:37 -08:00
Joseph E. Gonzalez 6083e4350f Adding unit tests to reproduce error. 2013-11-08 15:39:30 -08:00
Joseph E. Gonzalez 161784d0e6 Fixing tests 2013-11-07 20:40:21 -08:00
Joseph E. Gonzalez e523f0d2fb merged and debugged 2013-11-07 20:19:49 -08:00
Joseph E. Gonzalez 908e606473 Additional optimizations 2013-11-07 19:47:30 -08:00
Reynold Xin bac7be30cd Made more specialized messages. 2013-11-07 19:39:48 -08:00
Reynold Xin 64ad3b18d9 Merge branch 'master' into rxin
Conflicts:
	graph/src/main/scala/org/apache/spark/graph/impl/GraphImpl.scala
2013-11-07 19:23:42 -08:00
Reynold Xin 2406bf33e4 Use custom serializer for aggregation messages when the data type is int/double. 2013-11-07 19:18:58 -08:00
Joseph E. Gonzalez e9308e0e75 Changing Pregel API to operate directly on edge triplets in SendMessage rather than (Vid, EdgeTriplet) pairs. 2013-11-07 18:04:06 -08:00
Reynold Xin 6fadff2b92 Converted for loops to while loops in EdgePartition. 2013-11-07 16:54:33 -08:00
Dan Crankshaw 384befb208 Merge branch 'master' of github.com:amplab/graphx 2013-11-06 19:50:55 -08:00
Joseph E. Gonzalez 3e504938c2 merging upstream changes 2013-11-05 01:36:48 -08:00
Joseph E. Gonzalez 2dc9ec2387 Reverting to Array based (materialized) output of all VertexSetRDD operations. 2013-11-05 01:15:12 -08:00
Reynold Xin 551a43fd3d Merge branch 'master' of github.com:apache/incubator-spark into mergemerge
Conflicts:
	README.md
	core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala
	core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
	core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
2013-11-04 21:02:36 -08:00
Dan Crankshaw d87d112b2c Merge branch 'master' of github.com:amplab/graphx 2013-11-01 12:04:09 -07:00
Joseph E. Gonzalez e7d37472b8 After some testing I realized that the IndexedSeq is still instantiating the array (not maintaining a view) so I have replaced all IndexedSeq[V] with (Int => V) 2013-10-31 21:09:39 -07:00
Joseph E. Gonzalez 63311d9c72 renamed update to setMerge 2013-10-31 20:12:30 -07:00
Dan Crankshaw e218e30b52 Merge branch 'master' of github.com:amplab/graphx 2013-10-31 19:54:17 -07:00
Dan Crankshaw 0a61cafba8 Added logging to Graph, GraphLab, and Pregel. 2013-10-31 19:54:06 -07:00
Joseph E. Gonzalez 8381aeffb3 This commit introduces the OpenHashSet and OpenHashMap as indexing primitives.
Large parts of the VertexSetRDD were restructured to take advantage of:

  1) the OpenHashSet as an index map
  2) view based lazy mapValues and mapValuesWithVertices
  3) the cogroup code is currently disabled (since it is not used in any of the tests)

The GraphImpl was updated to also use the OpenHashSet and PrimitiveOpenHashMap
wherever possible:

  1) the LocalVidMaps (used to track replicated vertices) are now implemented
     using the OpenHashSet
  2) an OpenHashMap is temporarily constructed to combine the local OpenHashSet
     with the local (replicated) vertex attribute arrays
  3) because the OpenHashSet constructor grabs a class manifest all operations
     that construct OpenHashSets have been moved to the GraphImpl Singleton to prevent
     implicit variable capture within closures.
2013-10-31 18:13:02 -07:00
Dan Crankshaw b3bcfc09c7 Merge branch 'master' of github.com:amplab/graphx 2013-10-31 18:03:00 -07:00
Joseph E. Gonzalez aeb773fa47 Merging with upstream master. 2013-10-31 10:12:12 -07:00
Reynold Xin 3f3c727bc5 Merge pull request #41 from jegonzal/LineageTracking
Optimizing Graph Lineage
2013-10-31 09:52:25 -07:00
Joseph E. Gonzalez d6b5122532 Switching to the @rxin BitSet implementation for VertexSet Value tables. 2013-10-31 01:44:24 -07:00
Dan Crankshaw c430d2e21d Added bitset to kryo register 2013-10-31 01:01:59 -07:00
Dan Crankshaw 37b4afbbf9 Merge branch 'cleanup' 2013-10-30 23:17:50 -07:00
Joseph E. Gonzalez a3ce484a2c Adding additional type constraints to VertexSetRDD to help diagnose issues with recent benchmarks. 2013-10-30 21:02:21 -07:00
Joseph E. Gonzalez 09ea661bbb removing completely unnecessary map operation. 2013-10-30 20:07:26 -07:00
Joseph E. Gonzalez 003f8a505d Removing potential additional shuffle dependency where an already partitioned RDD[(Vid, VD)] is repartitioned. 2013-10-30 20:06:54 -07:00
Joseph E. Gonzalez d513addb77 added lineage tracking code 2013-10-30 20:05:29 -07:00
Joseph E. Gonzalez a4b8ddf417 removing unused commented code 2013-10-30 16:07:05 -07:00
Dan Crankshaw a0c86c3689 Merge pull request #38 from jegonzal/Documentation
Improving Documentation
2013-10-30 15:34:39 -07:00
Dan Crankshaw e1099f4d89 Fixed issue with canonical edge partitioner. 2013-10-30 15:03:21 -07:00
Joey 06adf636c5 Merge pull request #33 from kellrott/master
Fixing graph/pom.xml
2013-10-29 16:43:46 -07:00
Joseph E. Gonzalez 38ec0baf5c fixing a typo in the VertexSetRDD docs 2013-10-29 16:27:55 -07:00
Joseph E. Gonzalez d8c8256e52 merging upstream changes 2013-10-29 16:23:26 -07:00
Joseph E. Gonzalez 08c7b040d6 Documented the VertexSetRDD 2013-10-29 15:03:13 -07:00
Joseph E. Gonzalez ede329336d Fixing a scaladoc bug in graph generators. 2013-10-29 14:50:12 -07:00
Joseph E. Gonzalez 15958ca65a Reindenting documentation. 2013-10-29 14:01:24 -07:00
Joseph E. Gonzalez d316cad9b1 Documented Graph.appy functions. 2013-10-29 13:58:04 -07:00
Joseph E. Gonzalez 19da8820fc Minor modifications to documentation. 2013-10-29 11:06:06 -07:00
Joseph E. Gonzalez 77626d1507 Adding collect neighbors and documenting GraphOps. 2013-10-29 11:05:42 -07:00
Joseph E. Gonzalez 942de98433 Making suggested changes. 2013-10-29 10:19:49 -07:00
Joseph E. Gonzalez d6a902f309 Finished updating connected components to used Pregel like abstraction and created a series of tests in the AnalyticsSuite. 2013-10-28 11:52:26 -07:00
Joseph E. Gonzalez a2287ae138 Implementing connected components on top of pregel like abstraction. 2013-10-27 10:42:11 -07:00
Joseph E. Gonzalez 6a0fbc0374 Updating the GraphLab API to match the changes made to the Pregel API. 2013-10-26 15:44:19 -07:00
Joseph E. Gonzalez 08024c938c Adding more documentation to the Pregel API as well as additional functionality including the ability to specify the edge direction along which messages are computed. 2013-10-26 15:42:51 -07:00
Joseph E. Gonzalez 00e73833cc Fixing a bug in reverse edge direction. 2013-10-26 15:10:30 -07:00
Kyle Ellrott 8236d5dcc4 More changes to the graph/pom.xml to make it match the other subprojects 2013-10-25 15:52:44 -07:00
Kyle Ellrott d39ac2eb40 Merge https://github.com/amplab/graphx 2013-10-25 13:16:05 -07:00
Kyle Ellrott 59ec6b85d0 Merge branch 'master' of https://github.com/amplab/graphx 2013-10-24 10:29:24 -07:00
Joseph E. Gonzalez c30624dcbb Adding dynamic pregel, fixing bugs in PageRank, and adding basic analytics unit tests. 2013-10-23 00:25:45 -07:00
Joseph E. Gonzalez 0bd92ed8d0 Fixing a bug in pregel where the initial vertex-program results are lost. 2013-10-22 19:10:51 -07:00
Joseph E. Gonzalez be8269af07 Merge branch 'VertexSetRDD_Tests' into AnalyticsCleanup 2013-10-22 15:03:49 -07:00
Joseph E. Gonzalez e3eb03d5b5 Starting analytics test suite. 2013-10-22 15:03:16 -07:00
Joseph E. Gonzalez ba5c75692a Updating analytics to reflect changes in the pregel interface and moving degree information into the edge attribute. 2013-10-22 15:03:00 -07:00
Joseph E. Gonzalez 46b195253e Adding some additional graph generators to support unit testing of the analytics package. 2013-10-22 15:01:49 -07:00
Joseph E. Gonzalez 14a3329a11 Changing the Pregel interface slightly to better support type inference. 2013-10-22 15:01:20 -07:00
Kyle Ellrott 73bf8587e2 Fixing graph/pom.xml 2013-10-21 15:13:31 -07:00
Joseph E. Gonzalez ebdbedc3e9 Documenting VertexSetRDD and added some testing code for VertexSetRDD 2013-10-19 01:26:08 -07:00
Joseph E. Gonzalez dbc8c9868a Fixing bug in VertexSetRDD that breaks Graph tests. 2013-10-18 23:44:06 -07:00
Reynold Xin 9cf43cfeb7 Merge pull request #28 from jegonzal/VertexSetRDD
Refactoring IndexedRDD to VertexSetRDD.
2013-10-18 22:07:21 -07:00
Ankur Dave 2d3603930e Add a unit test for GraphOps.joinVertices 2013-10-18 19:46:13 -07:00
Ankur Dave d15db10831 Add a unit test for Graph.mapEdges 2013-10-18 19:46:13 -07:00
Ankur Dave d429f015c0 Update GraphSuite aggregateNeighbors test 2013-10-18 19:46:13 -07:00
Joseph E. Gonzalez 5d01ebca3c Specializing IndexedRDD as VertexSetRDD.
1) This allows the index map to be optimized for Vids
2) This makes the code more readable
2) The Graph API can now return VertexSetRDDs from operations that produce results for vertices
2013-10-18 19:03:59 -07:00
Joseph E. Gonzalez bb58aa5330 Added some stub code to address the case where a vertex could occur multiple times in the vertex table or where a vertex in the edge list may not appear in the vertex table.
Moving IndexedRDD into the graphx source tree and removing dependencies in /core.
2013-10-18 18:15:32 -07:00
Ankur Dave 36a902e52d Revert accidental removal of code in 3a40a5e 2013-10-18 16:19:40 -07:00
Dan Crankshaw 3a40a5eb30 Added some documentation. 2013-10-18 15:11:21 -07:00
Joseph E. Gonzalez 3f3d28c73f Switching from Seq to IndexedSeq 2013-10-17 19:55:36 -07:00
Joseph E. Gonzalez 9a03c5fe28 This commit accomplishes three goals:
1) Further simplification of the IndexedRDD operations (eliminating some)
 2) Aggressive reuse of HashMaps
 3) Pipelining join operations within indexedrdd
2013-10-17 19:01:48 -07:00
Ankur Dave bf19aac2b7 Use ArrayBuilder instead of ArrayBuffer
ArrayBuilder is specialized for holding primitive VD types.
2013-10-17 13:19:00 -07:00
Ankur Dave 2282d27cf1 Cache msgsByPartition 2013-10-16 23:56:15 -07:00
Ankur Dave bc234bf0e1 Split vTableReplicated into two RDDs
Previously, (vTableReplicated: IndexedRDD[Pid, VertexHashMap[VD]])
stored one hashmap per partition, taking Vid directly to VD.

To take advantage of rxin's new hashmaps (see
rxin/incubator-spark@32a79d6d13), this
commit splits that data structure into two RDDs:

(vTableReplicationMap: IndexedRDD[Pid, VertexIdToIndexMap]) stores a map
per partition from vertex ID to the index where that vertex's attribute
is stored. This index refers to an array in the same partition in
vTableReplicatedValues.

(vTableReplicatedValues: IndexedRDD[Pid, Array[VD]]) stores the vertex
data and is arranged as described above.
2013-10-16 19:22:23 -07:00
Ankur Dave af8e461841 Set serialization properties in GraphSuite 2013-10-16 19:21:24 -07:00
Joseph E. Gonzalez 57ac9073ae Introducing unique indexedrdd and adding numerous specialized joins 2013-10-16 04:08:22 -07:00
Joseph E. Gonzalez 59700c0c2a switched to more efficienct implementation of reduce by key 2013-10-16 00:18:37 -07:00
Joseph E. Gonzalez 9058f261fe Addressing issue where statistics are not computed correctly 2013-10-15 17:39:09 -07:00
Joseph E. Gonzalez 194bb03d16 Resolved closure capture issues by addressing capture through implicit variables. 2013-10-15 15:10:41 -07:00
Joseph E. Gonzalez 7241cf1632 Updating unit tests. 2013-10-15 14:18:03 -07:00
Joseph E. Gonzalez 345e1e94cc Still trying to resolve issues with capture. 2013-10-15 14:01:38 -07:00
Joseph E. Gonzalez b64337ec40 Trying to resolve issues with closure capture. 2013-10-15 13:02:17 -07:00
Joseph E. Gonzalez e7d0320000 More refactoring and documentating including renaming data to attr for vertex and edge data and eliminating the vertex type. 2013-10-15 02:20:06 -07:00
Joseph E. Gonzalez 67bb39c54b Removing extraneous code 2013-10-14 18:49:05 -07:00
Joseph E. Gonzalez bff223454a trying to address issues with GraphImpl being caught in closures. 2013-10-13 22:27:10 -07:00
Joseph E. Gonzalez 637b67da56 merging changes from upstream benchmarking branch 2013-10-13 19:54:09 -07:00
Joseph E. Gonzalez 494472a6cc Integrated IndexedRDD into graph design. 2013-10-13 19:42:32 -07:00
Dan Crankshaw 1a961dd1f2 Fixed connected components CL params 2013-10-12 01:47:38 +00:00
Dan Crankshaw 1e5535cfcf Added connected components back 2013-10-11 16:38:52 -07:00
Dan Crankshaw 543a54dffa Tried to fix some indenting 2013-10-11 16:07:49 -07:00
Dan Crankshaw c4a23f95c3 Updated code so benchmarks actually run. 2013-10-11 22:57:43 +00:00
Joseph E. Gonzalez fa2f87ca63 added replication and balance reporting 2013-10-10 14:48:40 -07:00
Joseph E. Gonzalez 5f756fb63f added support for random vertex cuts 2013-10-10 14:10:47 -07:00
Joseph E. Gonzalez 8dfac4ea8f added support for random vertex cuts 2013-10-10 14:09:01 -07:00
Dan Crankshaw 9929e7b9a5 Merge branch 'benchmarks' of github.com:amplab/graphx 2013-10-10 13:36:51 -07:00
Dan Crankshaw 4b46d519db Merge pull request #17 from amplab/product2
product 2 change
2013-10-10 13:35:36 -07:00
Reynold Xin 5218e46178 Updated Kryo registration. 2013-10-07 11:48:50 -07:00
Reynold Xin 4f916f5302 Created a MessageToPartition class to send messages without saving the partition id. 2013-10-07 11:31:00 -07:00
Dan Crankshaw 2a8f3db94d Fixed groupEdgeTriplets - it now passes a basic unit test.
The problem was with the way the EdgeTripletRDD iterator worked. Calling
toList on it returned the last value repeatedly. Fixed by overriding
toList in the iterator.
2013-10-06 19:52:40 -07:00
Dan Crankshaw 0d3ea36fd8 Added a groupEdges and a groupEdgeTriplets method. For some reason the groupEdgeTriplets method isn't properly iterating through the set of edges and thus is returning the wrong result. groupEdges seems to be working. 2013-10-06 18:34:23 -07:00
Dan Crankshaw 6cb21ce889 groupEdges() now compiles. Still need some unit tests 2013-10-06 15:33:35 -07:00
Dan Crankshaw 730a3156d3 Added initial groupEdges code. Still a prototype, I haven't figured out quite how it should all work yet. 2013-10-05 19:44:28 -07:00
Dan Crankshaw bfedbee13a Edge partitioner now partitions by canonical edge so all edges between two vertices (in either direction) will be sent to same machine. 2013-10-05 16:04:57 -07:00
Dan Crankshaw e096cbe90e Added 2D canonical edge partitioner 2013-10-05 15:20:15 -07:00
Dan Crankshaw da3e123afb Removed some comments 2013-10-03 18:11:35 -07:00
Dan Crankshaw 1ee60d3b34 Fixed bug in sampleLogNormal 2013-10-03 17:46:37 -07:00
Dan Crankshaw 27b442dc06 Fixed annotation import 2013-10-03 10:29:00 -07:00
Dan Crankshaw 8edd499eff Added rmat graph generator 2013-10-03 10:21:34 -07:00