ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	e5316d0685	Merge pull request #68 from mosharaf/master Faster and stable/reliable broadcast HttpBroadcast is noticeably slow, but the alternatives (TreeBroadcast or BitTorrentBroadcast) are notoriously unreliable. The main problem with them is they try to manage the memory for the pieces of a broadcast themselves. Right now, the BroadcastManager does not know which machines the tasks reading from a broadcast variable is running and when they have finished. Consequently, we try to guess and often guess wrong, which blows up the memory usage and kills/hangs jobs. This very simple implementation solves the problem by not trying to manage the intermediate pieces; instead, it offloads that duty to the BlockManager which is quite good at juggling blocks. Otherwise, it is very similar to the BitTorrentBroadcast implementation (without fancy optimizations). And it runs much faster than HttpBroadcast we have right now. I've been using this for another project for last couple of weeks, and just today did some benchmarking against the Http one. The following shows the improvements for increasing broadcast size for cold runs. Each line represent the number of receivers. ![fix-bc-first](https://f.cloud.github.com/assets/232966/1349342/ffa149e4-36e7-11e3-9fa6-c74555829356.png) After the first broadcast is over, i.e., after JVM is wormed up and for HttpBroadcast the server is already running (I think), the following are the improvements for warm runs. ![fix-bc-succ](https://f.cloud.github.com/assets/232966/1349352/5a948bae-36e8-11e3-98ce-34f19ebd33e0.jpg) The curves are not as nice as the cold runs, but the improvements are obvious, specially for larger broadcasts and more receivers. Depending on how it goes, we should deprecate and/or remove old TreeBroadcast and BitTorrentBroadcast implementations, and hopefully, SPARK-889 will not be necessary any more.	2013-10-18 20:30:56 -07:00
Matei Zaharia	8d528af829	Merge pull request #71 from aarondav/scdefaults Spark shell exits if it cannot create SparkContext Mainly, this occurs if you provide a messed up MASTER url (one that doesn't match one of our regexes). Previously, we would default to Mesos, fail, and then start the shell anyway, except that any Spark command would fail. Simply exiting seems clearer.	2013-10-18 20:24:10 -07:00
Ankur Dave	2d3603930e	Add a unit test for GraphOps.joinVertices	2013-10-18 19:46:13 -07:00
Ankur Dave	d15db10831	Add a unit test for Graph.mapEdges	2013-10-18 19:46:13 -07:00
Ankur Dave	d429f015c0	Update GraphSuite aggregateNeighbors test	2013-10-18 19:46:13 -07:00
Joseph E. Gonzalez	5d01ebca3c	Specializing IndexedRDD as VertexSetRDD. 1) This allows the index map to be optimized for Vids 2) This makes the code more readable 2) The Graph API can now return VertexSetRDDs from operations that produce results for vertices	2013-10-18 19:03:59 -07:00
Ankur Dave	0794bd7bc5	Merge pull request #27 from jegonzal/removed_indexedrdd_from_core Removing IndexedRDD changes for spark/core	2013-10-18 18:59:58 -07:00
Joseph E. Gonzalez	bb58aa5330	Added some stub code to address the case where a vertex could occur multiple times in the vertex table or where a vertex in the edge list may not appear in the vertex table. Moving IndexedRDD into the graphx source tree and removing dependencies in /core.	2013-10-18 18:15:32 -07:00
Joseph E. Gonzalez	fc5af50a2f	Merge branch 'master' of https://github.com/amplab/graphx	2013-10-18 18:15:17 -07:00
Ankur Dave	36a902e52d	Revert accidental removal of code in `3a40a5e`	2013-10-18 16:19:40 -07:00
Ankur Dave	971f824014	Revert unnecessary changes to core While benchmarking, we accidentally committed some unnecessary changes to core such as adding logging. These changes make it more difficult to merge from Spark upstream, so this commit reverts them.	2013-10-18 16:07:38 -07:00
Dan Crankshaw	8bd5f89662	Merge branch 'indexedrdd_graphx' of github.com:amplab/graphx into indexedrdd_graphx	2013-10-18 15:11:28 -07:00
Dan Crankshaw	3a40a5eb30	Added some documentation.	2013-10-18 15:11:21 -07:00
Joseph E. Gonzalez	1856b37e9d	Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx	2013-10-18 12:21:19 -07:00
Joseph E. Gonzalez	e028079b0f	Merging with spark upstream changes.	2013-10-18 12:02:14 -07:00
Prabeesh K	6ec39829e9	Update MQTTWordCount.scala	2013-10-18 17:00:28 +05:30
Mosharaf Chowdhury	08391dbcb8	Should compile now.	2013-10-17 23:06:17 -07:00
Mosharaf Chowdhury	8612641362	Added an after block to reset spark.broadcast.factory	2013-10-17 22:44:04 -07:00
Prabeesh K	d223d38933	Update MQTTInputDStream.scala	2013-10-18 09:09:49 +05:30
Joseph E. Gonzalez	3f3d28c73f	Switching from Seq to IndexedSeq	2013-10-17 19:55:36 -07:00
Joseph E. Gonzalez	9a03c5fe28	This commit accomplishes three goals: 1) Further simplification of the IndexedRDD operations (eliminating some) 2) Aggressive reuse of HashMaps 3) Pipelining join operations within indexedrdd	2013-10-17 19:01:48 -07:00
Aaron Davidson	74737264c4	Spark shell exits if it cannot create SparkContext Mainly, this occurs if you provide a messed up MASTER url (one that doesn't match one of our regexes). Previously, we would default to Mesos, fail, and then start the shell anyway, except that any Spark command would fail.	2013-10-17 18:51:19 -07:00
Mosharaf Chowdhury	90ab55fd37	Merge remote-tracking branch 'upstream/master'	2013-10-17 18:12:28 -07:00
Mosharaf Chowdhury	e178ae4e9b	BroadcastSuite updated to test both HttpBroadcast and TorrentBroadcast in local, local[N], local-cluster settings.	2013-10-17 16:38:43 -07:00
Joey	099977fd1b	Merge pull request #26 from ankurdave/split-vTableReplicated Great work!	2013-10-17 14:17:08 -07:00
Matei Zaharia	fc26e5b832	Merge pull request #69 from KarthikTunga/master Fix for issue SPARK-627. Implementing --config argument in the scripts. This code fix is for issue SPARK-627. I added code to consider --config arguments in the scripts. In case the <conf-dir> is not a directory the scripts exit. I removed the --hosts argument. It can be achieved by giving a different config directory. Let me know if an explicit --hosts argument is required.	2013-10-17 13:21:07 -07:00
Ankur Dave	bf19aac2b7	Use ArrayBuilder instead of ArrayBuffer ArrayBuilder is specialized for holding primitive VD types.	2013-10-17 13:19:00 -07:00
Mosharaf Chowdhury	6a84e40efe	Merge remote-tracking branch 'upstream/master'	2013-10-17 13:14:33 -07:00
Mosharaf Chowdhury	35b2415fb3	Code styling. Updated doc.	2013-10-17 13:14:12 -07:00
Matei Zaharia	cf64f63f8a	Merge pull request #67 from kayousterhout/remove_tsl Removed TaskSchedulerListener interface. The interface was used only by the DAG scheduler (so it wasn't necessary to define the additional interface), and the naming makes it very confusing when reading the code (because "listener" was used to describe the DAG scheduler, rather than SparkListeners, which implement a nearly-identical interface but serve a different function). @mateiz - is there a reason for this interface that I'm missing?	2013-10-17 11:12:28 -07:00
Mosharaf Chowdhury	e663750488	Removed unused code. Changes to match Spark coding style.	2013-10-17 00:19:50 -07:00
Ankur Dave	2282d27cf1	Cache msgsByPartition	2013-10-16 23:56:15 -07:00
Kay Ousterhout	809f547633	Fixed unit tests	2013-10-16 23:16:12 -07:00
KarthikTunga	8537f19268	SPARK-627 , Implementing --config arguments in the scripts	2013-10-16 23:00:33 -07:00
Reynold Xin	3e7df8f6c6	Added a number of very fast, memory-efficient data structures: BitSet, OpenHashSet, OpenHashMap, PrimitiveKeyOpenHashMap.	2013-10-16 22:58:52 -07:00
KarthikTunga	ff4fb1f7ee	SPARK-627 , Implementing --config arguments in the scripts	2013-10-16 22:55:15 -07:00
KarthikTunga	a32aa6b351	Implementing --config argument in the scripts	2013-10-16 22:51:09 -07:00
Mosharaf Chowdhury	e96bd0068f	BroadcastTest2 --> BroadcastTest	2013-10-16 21:33:33 -07:00
Mosharaf Chowdhury	a8d0981832	Fixes for the new BlockId naming convention.	2013-10-16 21:33:33 -07:00
Mosharaf Chowdhury	feb45d391f	Default blockSize is 4MB. BroadcastTest2 example added for testing broadcasts.	2013-10-16 21:33:33 -07:00
Mosharaf Chowdhury	6e5a60fab4	Removed unnecessary code, and added comment of memory-latency tradeoff.	2013-10-16 21:33:33 -07:00
Mosharaf Chowdhury	4602e2bf6e	Torrent-ish broadcast based on BlockManager.	2013-10-16 21:33:33 -07:00
prabeesh	890f8fe439	modify code, use Spark Logging Class	2013-10-17 10:00:40 +05:30
prabeesh	ee4178f144	remove unused dependency	2013-10-17 09:57:48 +05:30
prabeesh	29245605bf	remove unused dependency	2013-10-17 09:57:30 +05:30
Ankur Dave	bc234bf0e1	Split vTableReplicated into two RDDs Previously, (vTableReplicated: IndexedRDD[Pid, VertexHashMap[VD]]) stored one hashmap per partition, taking Vid directly to VD. To take advantage of rxin's new hashmaps (see rxin/incubator-spark@32a79d6d13), this commit splits that data structure into two RDDs: (vTableReplicationMap: IndexedRDD[Pid, VertexIdToIndexMap]) stores a map per partition from vertex ID to the index where that vertex's attribute is stored. This index refers to an array in the same partition in vTableReplicatedValues. (vTableReplicatedValues: IndexedRDD[Pid, Array[VD]]) stores the vertex data and is arranged as described above.	2013-10-16 19:22:23 -07:00
Ankur Dave	af8e461841	Set serialization properties in GraphSuite	2013-10-16 19:21:24 -07:00
Shivaram Venkataraman	0a4b76fcc2	Rename SBT target to assemble-deps.	2013-10-16 17:05:46 -07:00
Kay Ousterhout	ec512583ab	Removed TaskSchedulerListener interface. The interface was used only by the DAG scheduler (so it wasn't necessary to define the additional interface), and the naming makes it very confusing when reading the code (because "listener" was used to describe the DAG scheduler, rather than SparkListeners, which implement a nearly-identical interface but serve a different function).	2013-10-16 16:57:42 -07:00
Matei Zaharia	f9973cae3a	Merge pull request #65 from tgravescs/fixYarn Fix yarn build Fix the yarn build after renaming StandAloneX to CoarseGrainedX from pull request 34.	2013-10-16 15:58:41 -07:00

... 4 5 6 7 8 ...

4742 commits