Commit graph

4680 commits

Author SHA1 Message Date
Dan Crankshaw d87d112b2c Merge branch 'master' of github.com:amplab/graphx 2013-11-01 12:04:09 -07:00
Reynold Xin 99bfcc91e0 Merge pull request #46 from jegonzal/VertexSetWithHashSet
Switched VertexSetRDD and GraphImpl to use OpenHashSet
2013-10-31 21:38:10 -07:00
Joseph E. Gonzalez db89ac4bc8 Changing var to val for keySet in OpenHashMaps 2013-10-31 21:19:26 -07:00
Joseph E. Gonzalez e7d37472b8 After some testing I realized that the IndexedSeq is still instantiating the array (not maintaining a view) so I have replaced all IndexedSeq[V] with (Int => V) 2013-10-31 21:09:39 -07:00
Joseph E. Gonzalez 63311d9c72 renamed update to setMerge 2013-10-31 20:12:30 -07:00
Dan Crankshaw e218e30b52 Merge branch 'master' of github.com:amplab/graphx 2013-10-31 19:54:17 -07:00
Dan Crankshaw 0a61cafba8 Added logging to Graph, GraphLab, and Pregel. 2013-10-31 19:54:06 -07:00
Joseph E. Gonzalez 7f58440334 Merge branch 'master' of https://github.com/amplab/graphx into VertexSetWithHashSet 2013-10-31 18:30:50 -07:00
Reynold Xin fcaaf86803 Merge pull request #44 from jegonzal/rxinBitSet
Switching to VertexSetRDD to use @rxin BitSet and OpenHash
2013-10-31 18:27:30 -07:00
Joseph E. Gonzalez 8381aeffb3 This commit introduces the OpenHashSet and OpenHashMap as indexing primitives.
Large parts of the VertexSetRDD were restructured to take advantage of:

  1) the OpenHashSet as an index map
  2) view based lazy mapValues and mapValuesWithVertices
  3) the cogroup code is currently disabled (since it is not used in any of the tests)

The GraphImpl was updated to also use the OpenHashSet and PrimitiveOpenHashMap
wherever possible:

  1) the LocalVidMaps (used to track replicated vertices) are now implemented
     using the OpenHashSet
  2) an OpenHashMap is temporarily constructed to combine the local OpenHashSet
     with the local (replicated) vertex attribute arrays
  3) because the OpenHashSet constructor grabs a class manifest all operations
     that construct OpenHashSets have been moved to the GraphImpl Singleton to prevent
     implicit variable capture within closures.
2013-10-31 18:13:02 -07:00
Joseph E. Gonzalez 4ad58e2b9a This commit makes three changes to the (PrimitiveKey)OpenHashMap
1) _keySet  --renamed--> keySet
  2) keySet and _values are made externally accessible
  3) added an update function which merges duplicate values
2013-10-31 18:09:42 -07:00
Dan Crankshaw b3bcfc09c7 Merge branch 'master' of github.com:amplab/graphx 2013-10-31 18:03:00 -07:00
Joseph E. Gonzalez d74ad4ebc9 Adding ability to access local BitSet and to safely get a value at a given position 2013-10-31 18:01:34 -07:00
Joseph E. Gonzalez aeb773fa47 Merging with upstream master. 2013-10-31 10:12:12 -07:00
Reynold Xin 3f3c727bc5 Merge pull request #41 from jegonzal/LineageTracking
Optimizing Graph Lineage
2013-10-31 09:52:25 -07:00
Reynold Xin 944f6b8048 Merge pull request #43 from amplab/FixBitSetCastException
Fix BitSet cast exception
2013-10-31 09:40:35 -07:00
Joseph E. Gonzalez d6b5122532 Switching to the @rxin BitSet implementation for VertexSet Value tables. 2013-10-31 01:44:24 -07:00
Joseph E. Gonzalez 51aff8ddcf Adding logical AND/OR, setUntil, and iterators to the BitSet. 2013-10-31 01:43:50 -07:00
Dan Crankshaw c430d2e21d Added bitset to kryo register 2013-10-31 01:01:59 -07:00
Joseph E. Gonzalez a6267df25f Merge branch 'hash1' of https://github.com/rxin/incubator-spark into rxinBitSet 2013-10-30 23:24:33 -07:00
Dan Crankshaw 37b4afbbf9 Merge branch 'cleanup' 2013-10-30 23:17:50 -07:00
Joseph E. Gonzalez a3ce484a2c Adding additional type constraints to VertexSetRDD to help diagnose issues with recent benchmarks. 2013-10-30 21:02:21 -07:00
Joseph E. Gonzalez 09ea661bbb removing completely unnecessary map operation. 2013-10-30 20:07:26 -07:00
Joseph E. Gonzalez 003f8a505d Removing potential additional shuffle dependency where an already partitioned RDD[(Vid, VD)] is repartitioned. 2013-10-30 20:06:54 -07:00
Joseph E. Gonzalez d513addb77 added lineage tracking code 2013-10-30 20:05:29 -07:00
Joseph E. Gonzalez a4b8ddf417 removing unused commented code 2013-10-30 16:07:05 -07:00
Ankur Dave 5064f9b2d2 Merge remote-tracking branch 'spark-upstream/master'
Conflicts:
	project/SparkBuild.scala
2013-10-30 15:59:09 -07:00
Dan Crankshaw a0c86c3689 Merge pull request #38 from jegonzal/Documentation
Improving Documentation
2013-10-30 15:34:39 -07:00
Dan Crankshaw e1099f4d89 Fixed issue with canonical edge partitioner. 2013-10-30 15:03:21 -07:00
Matei Zaharia 618c1f6cf3 Merge pull request #125 from velvia/2013-10/local-jar-uri
Add support for local:// URI scheme for addJars()

This PR adds support for a new URI scheme for SparkContext.addJars():  `local://file/path`.
The *local* scheme indicates that the `/file/path` exists on every worker node.    The reason for its existence is for big library JARs, which would be really expensive to serve using the standard HTTP fileserver distribution method, especially for big clusters.  Today the only inexpensive method (assuming such a file is on every host, via say NFS, rsync, etc.) of doing this is to add the JAR to the SPARK_CLASSPATH, but we want a method where the user does not need to modify the Spark configuration.

I would add something to the docs, but it's not obvious where to add it.

Oh, and it would be great if this could be merged in time for 0.8.1.
2013-10-30 12:03:44 -07:00
Evan Chan de0285556a Add support for local:// URI scheme for addJars()
This indicates that a jar is available locally on each worker node.
2013-10-30 09:41:35 -07:00
Matei Zaharia 745dc42908 Merge pull request #118 from JoshRosen/blockinfo-memory-usage
Reduce the memory footprint of BlockInfo objects

This pull request reduces the memory footprint of all BlockInfo objects and makes additional optimizations for shuffle blocks.  For all BlockInfo objects, these changes remove two boolean fields and one Object field.  For shuffle blocks, we additionally remove an Object field and a boolean field.

When storing tens of thousands of these objects, this may add up to significant memory savings.  A ShuffleBlockInfo now only needs to wrap a single long.

This was motivated by a [report of high blockInfo memory usage during shuffles](https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C20131026134353.202b2b9b%40sh9%3E).

I haven't run benchmarks to measure the exact memory savings.

/cc @aarondav
2013-10-29 23:47:10 -07:00
Joey 4f63b5e17f Adding code example 2013-10-29 21:31:12 -07:00
Joey 1a20ba9b70 Updating images so they render correctly. 2013-10-29 21:06:29 -07:00
Joseph E. Gonzalez 41b3122120 Strating to improve README. 2013-10-29 20:57:55 -07:00
Josh Rosen cb9c8a922f Extract BlockInfo classes from BlockManager.
This saves space, since the inner classes needed
to keep a reference to the enclosing BlockManager.
2013-10-29 18:06:51 -07:00
Joey 06adf636c5 Merge pull request #33 from kellrott/master
Fixing graph/pom.xml
2013-10-29 16:43:46 -07:00
Joseph E. Gonzalez 38ec0baf5c fixing a typo in the VertexSetRDD docs 2013-10-29 16:27:55 -07:00
Joseph E. Gonzalez d8c8256e52 merging upstream changes 2013-10-29 16:23:26 -07:00
Josh Rosen 846b1cf5ab Store fewer BlockInfo fields for shuffle blocks. 2013-10-29 15:14:29 -07:00
Ankur Dave 098768e0b9 Merge pull request #37 from jegonzal/AnalyticsCleanup
Updated Connected Components and Pregel Docs
2013-10-29 15:08:36 -07:00
Joseph E. Gonzalez 08c7b040d6 Documented the VertexSetRDD 2013-10-29 15:03:13 -07:00
Joseph E. Gonzalez ede329336d Fixing a scaladoc bug in graph generators. 2013-10-29 14:50:12 -07:00
Joseph E. Gonzalez 15958ca65a Reindenting documentation. 2013-10-29 14:01:24 -07:00
Joseph E. Gonzalez d316cad9b1 Documented Graph.appy functions. 2013-10-29 13:58:04 -07:00
Joseph E. Gonzalez 19da8820fc Minor modifications to documentation. 2013-10-29 11:06:06 -07:00
Joseph E. Gonzalez 77626d1507 Adding collect neighbors and documenting GraphOps. 2013-10-29 11:05:42 -07:00
Joseph E. Gonzalez 942de98433 Making suggested changes. 2013-10-29 10:19:49 -07:00
Reynold Xin f0e23a023c Merge pull request #119 from soulmachine/master
A little revise for the document
2013-10-29 01:41:44 -04:00
Joseph E. Gonzalez d6a902f309 Finished updating connected components to used Pregel like abstraction and created a series of tests in the AnalyticsSuite. 2013-10-28 11:52:26 -07:00