ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Joseph E. Gonzalez	dbc8c9868a	Fixing bug in VertexSetRDD that breaks Graph tests.	2013-10-18 23:44:06 -07:00
Reynold Xin	9cf43cfeb7	Merge pull request #28 from jegonzal/VertexSetRDD Refactoring IndexedRDD to VertexSetRDD.	2013-10-18 22:07:21 -07:00
Reynold Xin	f888a5b051	Merge pull request #29 from ankurdave/unit-tests Unit tests for Graph and GraphOps	2013-10-18 22:06:58 -07:00
Ankur Dave	2d3603930e	Add a unit test for GraphOps.joinVertices	2013-10-18 19:46:13 -07:00
Ankur Dave	d15db10831	Add a unit test for Graph.mapEdges	2013-10-18 19:46:13 -07:00
Ankur Dave	d429f015c0	Update GraphSuite aggregateNeighbors test	2013-10-18 19:46:13 -07:00
Joseph E. Gonzalez	5d01ebca3c	Specializing IndexedRDD as VertexSetRDD. 1) This allows the index map to be optimized for Vids 2) This makes the code more readable 2) The Graph API can now return VertexSetRDDs from operations that produce results for vertices	2013-10-18 19:03:59 -07:00
Ankur Dave	0794bd7bc5	Merge pull request #27 from jegonzal/removed_indexedrdd_from_core Removing IndexedRDD changes for spark/core	2013-10-18 18:59:58 -07:00
Joseph E. Gonzalez	bb58aa5330	Added some stub code to address the case where a vertex could occur multiple times in the vertex table or where a vertex in the edge list may not appear in the vertex table. Moving IndexedRDD into the graphx source tree and removing dependencies in /core.	2013-10-18 18:15:32 -07:00
Joseph E. Gonzalez	fc5af50a2f	Merge branch 'master' of https://github.com/amplab/graphx	2013-10-18 18:15:17 -07:00
Ankur Dave	36a902e52d	Revert accidental removal of code in `3a40a5e`	2013-10-18 16:19:40 -07:00
Ankur Dave	971f824014	Revert unnecessary changes to core While benchmarking, we accidentally committed some unnecessary changes to core such as adding logging. These changes make it more difficult to merge from Spark upstream, so this commit reverts them.	2013-10-18 16:07:38 -07:00
Dan Crankshaw	8bd5f89662	Merge branch 'indexedrdd_graphx' of github.com:amplab/graphx into indexedrdd_graphx	2013-10-18 15:11:28 -07:00
Dan Crankshaw	3a40a5eb30	Added some documentation.	2013-10-18 15:11:21 -07:00
Joseph E. Gonzalez	1856b37e9d	Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx	2013-10-18 12:21:19 -07:00
Joseph E. Gonzalez	e028079b0f	Merging with spark upstream changes.	2013-10-18 12:02:14 -07:00
Joseph E. Gonzalez	3f3d28c73f	Switching from Seq to IndexedSeq	2013-10-17 19:55:36 -07:00
Joseph E. Gonzalez	9a03c5fe28	This commit accomplishes three goals: 1) Further simplification of the IndexedRDD operations (eliminating some) 2) Aggressive reuse of HashMaps 3) Pipelining join operations within indexedrdd	2013-10-17 19:01:48 -07:00
Joey	099977fd1b	Merge pull request #26 from ankurdave/split-vTableReplicated Great work!	2013-10-17 14:17:08 -07:00
Matei Zaharia	fc26e5b832	Merge pull request #69 from KarthikTunga/master Fix for issue SPARK-627. Implementing --config argument in the scripts. This code fix is for issue SPARK-627. I added code to consider --config arguments in the scripts. In case the <conf-dir> is not a directory the scripts exit. I removed the --hosts argument. It can be achieved by giving a different config directory. Let me know if an explicit --hosts argument is required.	2013-10-17 13:21:07 -07:00
Ankur Dave	bf19aac2b7	Use ArrayBuilder instead of ArrayBuffer ArrayBuilder is specialized for holding primitive VD types.	2013-10-17 13:19:00 -07:00
Matei Zaharia	cf64f63f8a	Merge pull request #67 from kayousterhout/remove_tsl Removed TaskSchedulerListener interface. The interface was used only by the DAG scheduler (so it wasn't necessary to define the additional interface), and the naming makes it very confusing when reading the code (because "listener" was used to describe the DAG scheduler, rather than SparkListeners, which implement a nearly-identical interface but serve a different function). @mateiz - is there a reason for this interface that I'm missing?	2013-10-17 11:12:28 -07:00
Ankur Dave	2282d27cf1	Cache msgsByPartition	2013-10-16 23:56:15 -07:00
Kay Ousterhout	809f547633	Fixed unit tests	2013-10-16 23:16:12 -07:00
KarthikTunga	8537f19268	SPARK-627 , Implementing --config arguments in the scripts	2013-10-16 23:00:33 -07:00
KarthikTunga	ff4fb1f7ee	SPARK-627 , Implementing --config arguments in the scripts	2013-10-16 22:55:15 -07:00
KarthikTunga	a32aa6b351	Implementing --config argument in the scripts	2013-10-16 22:51:09 -07:00
Ankur Dave	bc234bf0e1	Split vTableReplicated into two RDDs Previously, (vTableReplicated: IndexedRDD[Pid, VertexHashMap[VD]]) stored one hashmap per partition, taking Vid directly to VD. To take advantage of rxin's new hashmaps (see rxin/incubator-spark@32a79d6d13), this commit splits that data structure into two RDDs: (vTableReplicationMap: IndexedRDD[Pid, VertexIdToIndexMap]) stores a map per partition from vertex ID to the index where that vertex's attribute is stored. This index refers to an array in the same partition in vTableReplicatedValues. (vTableReplicatedValues: IndexedRDD[Pid, Array[VD]]) stores the vertex data and is arranged as described above.	2013-10-16 19:22:23 -07:00
Ankur Dave	af8e461841	Set serialization properties in GraphSuite	2013-10-16 19:21:24 -07:00
Kay Ousterhout	ec512583ab	Removed TaskSchedulerListener interface. The interface was used only by the DAG scheduler (so it wasn't necessary to define the additional interface), and the naming makes it very confusing when reading the code (because "listener" was used to describe the DAG scheduler, rather than SparkListeners, which implement a nearly-identical interface but serve a different function).	2013-10-16 16:57:42 -07:00
Matei Zaharia	f9973cae3a	Merge pull request #65 from tgravescs/fixYarn Fix yarn build Fix the yarn build after renaming StandAloneX to CoarseGrainedX from pull request 34.	2013-10-16 15:58:41 -07:00
tgravescs	cc7df2b3cc	Fix yarn build	2013-10-16 10:09:16 -05:00
Joseph E. Gonzalez	57ac9073ae	Introducing unique indexedrdd and adding numerous specialized joins	2013-10-16 04:08:22 -07:00
Joseph E. Gonzalez	59700c0c2a	switched to more efficienct implementation of reduce by key	2013-10-16 00:18:37 -07:00
Joseph E. Gonzalez	80e4ec3278	IndexedRDD now only supports unique keys	2013-10-16 00:16:44 -07:00
Matei Zaharia	28e9c2abc0	Merge pull request #63 from pwendell/master Fixing spark streaming example and a bug in examples build. - Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example	2013-10-15 23:59:56 -07:00
Matei Zaharia	4e46fde818	Merge pull request #62 from harveyfeng/master Make TaskContext's stageId publicly accessible.	2013-10-15 23:14:27 -07:00
Patrick Wendell	35befe07bb	Fixing spark streaming example and a bug in examples build. - Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example	2013-10-15 22:55:43 -07:00
Harvey Feng	65b46236e7	Proper formatting for SparkHadoopWriter class extensions.	2013-10-15 21:51:52 -07:00
Matei Zaharia	b5346064d6	Merge pull request #8 from vchekan/checkpoint-ttl-restore Serialize and restore spark.cleaner.ttl to savepoint In accordance to conversation in spark-dev maillist, preserve spark.cleaner.ttl parameter when serializing checkpoint.	2013-10-15 21:25:03 -07:00
Matei Zaharia	6dbd2208ff	Merge pull request #34 from kayousterhout/rename Renamed StandaloneX to CoarseGrainedX. (as suggested by @rxin here https://github.com/apache/incubator-spark/pull/14) The previous names were confusing because the components weren't just used in Standalone mode. The scheduler used for Standalone mode is called SparkDeploySchedulerBackend, so referring to the base class as StandaloneSchedulerBackend was misleading.	2013-10-15 19:02:57 -07:00
Matei Zaharia	983b83f24d	Merge pull request #61 from kayousterhout/daemon_thread Unified daemon thread pools As requested by @mateiz in an earlier pull request, this refactors various daemon thread pools to use a set of methods in utils.scala, and also changes the thread-pool-creation methods in utils.scala to use named thread pools for improved debugging.	2013-10-15 19:02:46 -07:00
Joseph E. Gonzalez	3cb6dffce0	adding indexed reduce by key	2013-10-15 18:55:06 -07:00
Harvey Feng	c4c76e37a7	Fix line length > 100 chars in SparkHadoopWriter	2013-10-15 18:35:59 -07:00
Harvey Feng	5b8083fee5	Make TaskContext's stageId publicly accessible.	2013-10-15 18:06:37 -07:00
Joseph E. Gonzalez	9058f261fe	Addressing issue where statistics are not computed correctly	2013-10-15 17:39:09 -07:00
Joseph E. Gonzalez	1b22eef744	Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx	2013-10-15 16:15:19 -07:00
Joseph E. Gonzalez	194bb03d16	Resolved closure capture issues by addressing capture through implicit variables.	2013-10-15 15:10:41 -07:00
Kay Ousterhout	f95a2be045	Fixed build error after merging in master	2013-10-15 14:51:37 -07:00
Kay Ousterhout	acc7638f7c	Merge remote branch 'upstream/master' into rename	2013-10-15 14:43:56 -07:00

1 2 3 4 5 ...

4456 commits