Commit graph

4760 commits

Author SHA1 Message Date
Dan Crankshaw 958d7213a5 Merge branch 'master' of https://github.com/amplab/graphx 2013-11-13 23:31:14 +00:00
Reynold Xin a81fcb749d Merge pull request #68 from jegonzal/BitSetSetUntilBug
Addressing bug in BitSet.setUntil(ind)
2013-11-13 10:41:01 -08:00
Joseph E. Gonzalez f0ef75c7a4 Addressing bug in BitSet.setUntil(ind) where if invoked with a multiple of 64 could lead to an index out of bounds error. 2013-11-13 10:35:23 -08:00
Dan Crankshaw d19f2e8f3e Removed slaves from git 2013-11-12 05:21:34 +00:00
Joey 143c01dbd6 Update README.md
Changing image references to master branch.
2013-11-11 19:37:16 -08:00
Reynold Xin 2e8d45032d Merge pull request #63 from jegonzal/VertexSetCleanup
Cleanup of VertexSetRDD
2013-11-11 17:34:09 -08:00
Joseph E. Gonzalez 577092080c Cleanning up documentation of VertexSetRDD.scala 2013-11-11 17:29:22 -08:00
Reynold Xin b8e294a21b Merge pull request #61 from ankurdave/pid2vid
Shuffle replicated vertex attributes efficiently in columnar format
2013-11-11 16:25:42 -08:00
Reynold Xin 3d7277ccbe Merge pull request #55 from ankurdave/aggregateNeighbors-variants
Specialize mapReduceTriplets for accessing subsets of vertex attributes
2013-11-11 15:49:28 -08:00
Ankur Dave bee1015620 Handle ClassNotFoundException from ByteCodeUtils
ByteCodeUtils.invokedMethod(), which we use in mapReduceTriplets, throws
a ClassNotFoundException when called with a closure defined in the
console. This commit catches the exception and conservatively assumes
the closure references all edge attributes.
2013-11-10 23:00:37 -08:00
Dan Crankshaw 60db25bded Fixed merge conflicts. 2013-11-10 15:45:55 -08:00
Ankur Dave d1ff1b7222 Build pid2vid structures only once, in Vid2Pid 2013-11-10 14:47:39 -08:00
Ankur Dave 502c511711 Use pid2vid for creating VTableReplicatedValues 2013-11-10 14:36:14 -08:00
Ankur Dave 53d24a973e Fix typo 2013-11-10 14:24:38 -08:00
Ankur Dave aa24b0bbe8 Add test for mapReduceTriplets in GraphSuite 2013-11-10 14:24:38 -08:00
Ankur Dave bf4e45e685 Factor out VTableReplicatedValues 2013-11-10 14:24:38 -08:00
Ankur Dave cdbd19bbee Create all versions of vid2pid ahead of time 2013-11-10 14:10:23 -08:00
Ankur Dave 27e4355d61 Test no vertex attribute replication 2013-11-10 14:04:12 -08:00
Ankur Dave 80abc28078 Optimize mrTriplets for source-attr-only mapF using bytecode inspection 2013-11-10 14:04:12 -08:00
Joey 1a06f707e3 Merge pull request #60 from amplab/rxin
Looks good to me.
2013-11-10 10:54:44 -08:00
Reynold Xin 0e813cd483 Fix the hanging bug. 2013-11-09 23:29:37 -08:00
Reynold Xin f6c946206a Merge pull request #58 from jegonzal/KryoMessages
Kryo messages
2013-11-09 16:14:45 -08:00
Joseph E. Gonzalez 6083e4350f Adding unit tests to reproduce error. 2013-11-08 15:39:30 -08:00
Joseph E. Gonzalez 161784d0e6 Fixing tests 2013-11-07 20:40:21 -08:00
Joseph E. Gonzalez e523f0d2fb merged and debugged 2013-11-07 20:19:49 -08:00
Joseph E. Gonzalez 908e606473 Additional optimizations 2013-11-07 19:47:30 -08:00
Reynold Xin bac7be30cd Made more specialized messages. 2013-11-07 19:39:48 -08:00
Reynold Xin 64ad3b18d9 Merge branch 'master' into rxin
Conflicts:
	graph/src/main/scala/org/apache/spark/graph/impl/GraphImpl.scala
2013-11-07 19:23:42 -08:00
Reynold Xin 2406bf33e4 Use custom serializer for aggregation messages when the data type is int/double. 2013-11-07 19:18:58 -08:00
Ankur Dave 6ee05be1c8 Merge pull request #49 from jegonzal/graphxshell
GraphX Console with Logo Text
2013-11-07 19:12:41 -08:00
Ankur Dave a9f96b54e4 Merge pull request #56 from jegonzal/PregelAPIChanges
Changing Pregel API to use mapReduceTriplets instead of aggregateNeighbors
2013-11-07 18:56:56 -08:00
Joseph E. Gonzalez e9308e0e75 Changing Pregel API to operate directly on edge triplets in SendMessage rather than (Vid, EdgeTriplet) pairs. 2013-11-07 18:04:06 -08:00
Reynold Xin 5907137d11 Merge pull request #54 from amplab/rxin
Converted for loops to while loops in EdgePartition.
2013-11-07 16:58:31 -08:00
Reynold Xin 6fadff2b92 Converted for loops to while loops in EdgePartition. 2013-11-07 16:54:33 -08:00
Reynold Xin edf41647f4 Merge pull request #53 from amplab/rxin
Added GraphX to classpath.
2013-11-07 16:22:43 -08:00
Reynold Xin 95f1f5315e Added GraphX to classpath. 2013-11-07 16:22:05 -08:00
Reynold Xin c379e10455 Merge pull request #51 from jegonzal/VertexSetRDD
Reverting to Array based (materialized) output in VertexSetRDD
2013-11-07 16:01:47 -08:00
Dan Crankshaw 384befb208 Merge branch 'master' of github.com:amplab/graphx 2013-11-06 19:50:55 -08:00
Joseph E. Gonzalez 8ac15e8e43 Merge branch 'master' of https://github.com/amplab/graphx into graphxshell 2013-11-05 01:37:12 -08:00
Joseph E. Gonzalez 3e504938c2 merging upstream changes 2013-11-05 01:36:48 -08:00
Joey ca44b5134a Merge pull request #50 from amplab/mergemerge
Merge Spark master into graphx
2013-11-05 01:32:55 -08:00
Joseph E. Gonzalez 2dc9ec2387 Reverting to Array based (materialized) output of all VertexSetRDD operations. 2013-11-05 01:15:12 -08:00
Reynold Xin 551a43fd3d Merge branch 'master' of github.com:apache/incubator-spark into mergemerge
Conflicts:
	README.md
	core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala
	core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
	core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
2013-11-04 21:02:36 -08:00
Joseph E. Gonzalez 3c37928fab This commit adds a new graphx-shell which is essentially the same as
the spark shell but with GraphX packages automatically imported
and with Kryo serialization enabled for GraphX types.

In addition the graphx-shell has a nifty new logo.

To make these changes minimally invasive in the SparkILoop.scala
I added some additional environment variables:

   SPARK_BANNER_TEXT: If set this string is displayed instead
   of the spark logo

   SPARK_SHELL_INIT_BLOCK: if set this expression is evaluated in the
   spark shell after the spark context is created.
2013-11-04 20:10:15 -08:00
Reynold Xin 7a26104ab7 Merge pull request #130 from aarondav/shuffle
Memory-optimized shuffle file consolidation

Reduces overhead of each shuffle block for consolidation from >300 bytes to 8 bytes (1 primitive Long). Verified via profiler testing with 1 mil shuffle blocks, net overhead was ~8,400,000 bytes.

Despite the memory-optimized implementation incurring extra CPU overhead, the runtime of the shuffle phase in this test was only around 2% slower, while the reduce phase was 40% faster, when compared to not using any shuffle file consolidation.

This is accomplished by replacing the map from ShuffleBlockId to FileSegment (i.e., block id to where it's located), which had high overhead due to being a gigantic, timestamped, concurrent map with a more space-efficient structure. Namely, the following are introduced (I have omitted the word "Shuffle" from some names for clarity):
**ShuffleFile** - there is one ShuffleFile per consolidated shuffle file on disk. We store an array of offsets into the physical shuffle file for each ShuffleMapTask that wrote into the file. This is sufficient to reconstruct FileSegments for mappers that are in the file.
**FileGroup** - contains a set of ShuffleFiles, one per reducer, that a MapTask can use to write its output. There is one FileGroup created per _concurrent_ MapTask. The FileGroup contains an array of the mapIds that have been written to all files in the group. The positions of elements in this array map directly onto the positions in each ShuffleFile's offsets array.

In order to locate the FileSegment associated with a BlockId, we have another structure which maps each reducer to the set of ShuffleFiles that were created for it. (There will be as many ShuffleFiles per reducer as there are FileGroups.) To lookup a given ShuffleBlockId (shuffleId, reducerId, mapId), we thus search through all ShuffleFiles associated with that reducer.

As a time optimization, we ensure that FileGroups are only reused for MapTasks with monotonically increasing mapIds. This allows us to perform a binary search to locate a mapId inside a group, and also enables potential future optimization (based on the usual monotonic access order).
2013-11-04 17:54:06 -08:00
Aaron Davidson 1ba11b1c6a Minor cleanup in ShuffleBlockManager 2013-11-04 17:16:41 -08:00
Aaron Davidson 6201e5e249 Refactor ShuffleBlockManager to reduce public interface
- ShuffleBlocks has been removed and replaced by ShuffleWriterGroup.
- ShuffleWriterGroup no longer contains a reference to a ShuffleFileGroup.
- ShuffleFile has been removed and its contents are now within ShuffleFileGroup.
- ShuffleBlockManager.forShuffle has been replaced by a more stateful forMapTask.
2013-11-04 09:41:04 -08:00
Aaron Davidson b0cf19fe3c Add javadoc and remove unused code 2013-11-03 22:16:58 -08:00
Aaron Davidson 39d93ed4b9 Clean up test files properly
For some reason, even calling
java.nio.Files.createTempDirectory().getFile.deleteOnExit()
does not delete the directory on exit. Guava's analagous function
seems to work, however.
2013-11-03 21:52:59 -08:00
Aaron Davidson a0bb569a81 use OpenHashMap, remove monotonicity requirement, fix failure bug 2013-11-03 21:34:56 -08:00