ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Evan Chan	e54a37fe15	Document all the URIs for addJar/addFile	2013-11-01 10:58:11 -07:00
Reynold Xin	99bfcc91e0	Merge pull request #46 from jegonzal/VertexSetWithHashSet Switched VertexSetRDD and GraphImpl to use OpenHashSet	2013-10-31 21:38:10 -07:00
Joseph E. Gonzalez	db89ac4bc8	Changing var to val for keySet in OpenHashMaps	2013-10-31 21:19:26 -07:00
Joseph E. Gonzalez	e7d37472b8	After some testing I realized that the IndexedSeq is still instantiating the array (not maintaining a view) so I have replaced all IndexedSeq[V] with (Int => V)	2013-10-31 21:09:39 -07:00
Joseph E. Gonzalez	63311d9c72	renamed update to setMerge	2013-10-31 20:12:30 -07:00
Dan Crankshaw	e218e30b52	Merge branch 'master' of github.com:amplab/graphx	2013-10-31 19:54:17 -07:00
Dan Crankshaw	0a61cafba8	Added logging to Graph, GraphLab, and Pregel.	2013-10-31 19:54:06 -07:00
Joseph E. Gonzalez	7f58440334	Merge branch 'master' of https://github.com/amplab/graphx into VertexSetWithHashSet	2013-10-31 18:30:50 -07:00
Reynold Xin	fcaaf86803	Merge pull request #44 from jegonzal/rxinBitSet Switching to VertexSetRDD to use @rxin BitSet and OpenHash	2013-10-31 18:27:30 -07:00
Joseph E. Gonzalez	8381aeffb3	This commit introduces the OpenHashSet and OpenHashMap as indexing primitives. Large parts of the VertexSetRDD were restructured to take advantage of: 1) the OpenHashSet as an index map 2) view based lazy mapValues and mapValuesWithVertices 3) the cogroup code is currently disabled (since it is not used in any of the tests) The GraphImpl was updated to also use the OpenHashSet and PrimitiveOpenHashMap wherever possible: 1) the LocalVidMaps (used to track replicated vertices) are now implemented using the OpenHashSet 2) an OpenHashMap is temporarily constructed to combine the local OpenHashSet with the local (replicated) vertex attribute arrays 3) because the OpenHashSet constructor grabs a class manifest all operations that construct OpenHashSets have been moved to the GraphImpl Singleton to prevent implicit variable capture within closures.	2013-10-31 18:13:02 -07:00
Joseph E. Gonzalez	4ad58e2b9a	This commit makes three changes to the (PrimitiveKey)OpenHashMap 1) _keySet --renamed--> keySet 2) keySet and _values are made externally accessible 3) added an update function which merges duplicate values	2013-10-31 18:09:42 -07:00
Dan Crankshaw	b3bcfc09c7	Merge branch 'master' of github.com:amplab/graphx	2013-10-31 18:03:00 -07:00
Joseph E. Gonzalez	d74ad4ebc9	Adding ability to access local BitSet and to safely get a value at a given position	2013-10-31 18:01:34 -07:00
Joseph E. Gonzalez	aeb773fa47	Merging with upstream master.	2013-10-31 10:12:12 -07:00
Reynold Xin	3f3c727bc5	Merge pull request #41 from jegonzal/LineageTracking Optimizing Graph Lineage	2013-10-31 09:52:25 -07:00
Reynold Xin	944f6b8048	Merge pull request #43 from amplab/FixBitSetCastException Fix BitSet cast exception	2013-10-31 09:40:35 -07:00
Joseph E. Gonzalez	d6b5122532	Switching to the @rxin BitSet implementation for VertexSet Value tables.	2013-10-31 01:44:24 -07:00
Joseph E. Gonzalez	51aff8ddcf	Adding logical AND/OR, setUntil, and iterators to the BitSet.	2013-10-31 01:43:50 -07:00
Dan Crankshaw	c430d2e21d	Added bitset to kryo register	2013-10-31 01:01:59 -07:00
Joseph E. Gonzalez	a6267df25f	Merge branch 'hash1' of https://github.com/rxin/incubator-spark into rxinBitSet	2013-10-30 23:24:33 -07:00
Dan Crankshaw	37b4afbbf9	Merge branch 'cleanup'	2013-10-30 23:17:50 -07:00
Joseph E. Gonzalez	a3ce484a2c	Adding additional type constraints to VertexSetRDD to help diagnose issues with recent benchmarks.	2013-10-30 21:02:21 -07:00
Matei Zaharia	8f1098a3f0	Merge pull request #117 from stephenh/avoid_concurrent_modification_exception Handle ConcurrentModificationExceptions in SparkContext init. System.getProperties.toMap will fail-fast when concurrently modified, and it seems like some other thread started by SparkContext does a System.setProperty during it's initialization. Handle this by just looping on ConcurrentModificationException, which seems the safest, since the non-fail-fast methods (Hastable.entrySet) have undefined behavior under concurrent modification.	2013-10-30 20:11:48 -07:00
Joseph E. Gonzalez	09ea661bbb	removing completely unnecessary map operation.	2013-10-30 20:07:26 -07:00
Joseph E. Gonzalez	003f8a505d	Removing potential additional shuffle dependency where an already partitioned RDD[(Vid, VD)] is repartitioned.	2013-10-30 20:06:54 -07:00
Joseph E. Gonzalez	d513addb77	added lineage tracking code	2013-10-30 20:05:29 -07:00
Matei Zaharia	dc9ce16f6b	Merge pull request #126 from kayousterhout/local_fix Fixed incorrect log message in local scheduler This change is especially relevant at the moment, because some users are seeing this failure, and the log message is misleading/incorrect (because for the tests, the max failures is set to 0, not 4)	2013-10-30 17:01:56 -07:00
Matei Zaharia	33de11c51d	Merge pull request #124 from tgravescs/sparkHadoopUtilFix Pull SparkHadoopUtil out of SparkEnv (jira SPARK-886) Having the logic to initialize the correct SparkHadoopUtil in SparkEnv prevents it from being used until after the SparkContext is initialized. This causes issues like https://spark-project.atlassian.net/browse/SPARK-886. It also makes it hard to use in singleton objects. For instance I want to use it in the security code.	2013-10-30 16:58:27 -07:00
Joseph E. Gonzalez	a4b8ddf417	removing unused commented code	2013-10-30 16:07:05 -07:00
Ankur Dave	5064f9b2d2	Merge remote-tracking branch 'spark-upstream/master' Conflicts: project/SparkBuild.scala	2013-10-30 15:59:09 -07:00
Dan Crankshaw	a0c86c3689	Merge pull request #38 from jegonzal/Documentation Improving Documentation	2013-10-30 15:34:39 -07:00
Kay Ousterhout	ff038eb4e0	Fixed incorrect log message in local scheduler	2013-10-30 15:27:23 -07:00
Dan Crankshaw	e1099f4d89	Fixed issue with canonical edge partitioner.	2013-10-30 15:03:21 -07:00
Matei Zaharia	618c1f6cf3	Merge pull request #125 from velvia/2013-10/local-jar-uri Add support for local:// URI scheme for addJars() This PR adds support for a new URI scheme for SparkContext.addJars(): `local://file/path`. The local scheme indicates that the `/file/path` exists on every worker node. The reason for its existence is for big library JARs, which would be really expensive to serve using the standard HTTP fileserver distribution method, especially for big clusters. Today the only inexpensive method (assuming such a file is on every host, via say NFS, rsync, etc.) of doing this is to add the JAR to the SPARK_CLASSPATH, but we want a method where the user does not need to modify the Spark configuration. I would add something to the docs, but it's not obvious where to add it. Oh, and it would be great if this could be merged in time for 0.8.1.	2013-10-30 12:03:44 -07:00
Stephen Haberman	09f3b677cb	Avoid match errors when filtering for spark.hadoop settings.	2013-10-30 12:29:39 -05:00
tgravescs	f231aaa24c	move the hadoopJobMetadata back into SparkEnv	2013-10-30 11:46:12 -05:00
Evan Chan	de0285556a	Add support for local:// URI scheme for addJars() This indicates that a jar is available locally on each worker node.	2013-10-30 09:41:35 -07:00
tgravescs	54d9c6f253	Merge remote-tracking branch 'upstream/master' into sparkHadoopUtilFix	2013-10-30 10:41:21 -05:00
Matei Zaharia	745dc42908	Merge pull request #118 from JoshRosen/blockinfo-memory-usage Reduce the memory footprint of BlockInfo objects This pull request reduces the memory footprint of all BlockInfo objects and makes additional optimizations for shuffle blocks. For all BlockInfo objects, these changes remove two boolean fields and one Object field. For shuffle blocks, we additionally remove an Object field and a boolean field. When storing tens of thousands of these objects, this may add up to significant memory savings. A ShuffleBlockInfo now only needs to wrap a single long. This was motivated by a [report of high blockInfo memory usage during shuffles](https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C20131026134353.202b2b9b%40sh9%3E). I haven't run benchmarks to measure the exact memory savings. /cc @aarondav	2013-10-29 23:47:10 -07:00
Joey	4f63b5e17f	Adding code example	2013-10-29 21:31:12 -07:00
Joey	1a20ba9b70	Updating images so they render correctly.	2013-10-29 21:06:29 -07:00
Joseph E. Gonzalez	41b3122120	Strating to improve README.	2013-10-29 20:57:55 -07:00
tgravescs	e5e0ebdb11	fix sparkhdfs lr test	2013-10-29 20:12:45 -05:00
Josh Rosen	cb9c8a922f	Extract BlockInfo classes from BlockManager. This saves space, since the inner classes needed to keep a reference to the enclosing BlockManager.	2013-10-29 18:06:51 -07:00
Stephen Haberman	3a388c320c	Use Properties.clone() instead.	2013-10-29 19:20:40 -05:00
Joey	06adf636c5	Merge pull request #33 from kellrott/master Fixing graph/pom.xml	2013-10-29 16:43:46 -07:00
Joseph E. Gonzalez	38ec0baf5c	fixing a typo in the VertexSetRDD docs	2013-10-29 16:27:55 -07:00
Joseph E. Gonzalez	d8c8256e52	merging upstream changes	2013-10-29 16:23:26 -07:00
Josh Rosen	846b1cf5ab	Store fewer BlockInfo fields for shuffle blocks.	2013-10-29 15:14:29 -07:00
tgravescs	eeb5f64c67	Remove SparkHadoopUtil stuff from SparkEnv	2013-10-29 17:12:16 -05:00

... 6 7 8 9 10 ...

5041 commits