ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Joseph E. Gonzalez	e7d0320000	More refactoring and documentating including renaming data to attr for vertex and edge data and eliminating the vertex type.	2013-10-15 02:20:06 -07:00
KarthikTunga	6c6b146fc2	Merge branch 'master' of https://github.com/apache/incubator-spark Updating local branch	2013-10-15 00:46:35 -07:00
KarthikTunga	d2c86e7188	SPARK-627 - reading --config argument	2013-10-15 00:35:44 -07:00
Reynold Xin	f41feb7b33	Bump up logging level to warning for failed tasks.	2013-10-14 23:35:32 -07:00
Joseph E. Gonzalez	6a13d02319	Merging chagnes for IndexedRDD branch	2013-10-14 23:30:36 -07:00
Joseph E. Gonzalez	6700ccd7d5	Introducing indexedrdd The rest of indexed rdd	2013-10-14 23:30:35 -07:00
Joseph E. Gonzalez	4755f42d78	moving indexedrdd to the correct location	2013-10-14 23:13:27 -07:00
Henry Saputra	3fed3e2283	Update pom.xml to use version 13 of the ASF parent pom and add mailingLists element.	2013-10-14 23:10:54 -07:00
Joseph E. Gonzalez	ef7c369092	merged with upstream changes	2013-10-14 22:56:42 -07:00
Patrick Wendell	e33b1839e2	Merge pull request #29 from rxin/kill Job killing Moving https://github.com/mesos/spark/pull/935 here The high level idea is to have an "interrupted" field in TaskContext, and a task should check that flag to determine if its execution should continue. For convenience, I provide an InterruptibleIterator which wraps around a normal iterator but checks for the interrupted flag. I also provide an InterruptibleRDD that wraps around an existing RDD. As part of this pull request, I added an AsyncRDDActions class that provides a number of RDD actions that return a FutureJob (extending scala.concurrent.Future). The FutureJob can be used to kill the job execution, or waits until the job finishes. This is NOT ready for merging yet. Remaining TODOs: 1. Add unit tests 2. Add job killing functionality for local scheduler (current job killing functionality only works in cluster scheduler) ------------- Update on Oct 10, 2013: This is ready! Related future work: - Figure out how to handle the job triggered by RangePartitioner (this one is tough; might become future work) - Java API - Python API	2013-10-14 22:25:47 -07:00
Reynold Xin	9cd8786e4a	Merge branch 'master' of github.com:apache/incubator-spark into kill Conflicts: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala	2013-10-14 21:51:30 -07:00
Joseph E. Gonzalez	bf059691f0	Adding a few extra comments.	2013-10-14 19:59:11 -07:00
Joseph E. Gonzalez	11a44d0ec9	Introducing indexedrdd The rest of indexed rdd	2013-10-14 19:46:42 -07:00
Joseph E. Gonzalez	67bb39c54b	Removing extraneous code	2013-10-14 18:49:05 -07:00
Reynold Xin	3b11f43e36	Merge pull request #57 from aarondav/bid Refactor BlockId into an actual type Converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Downside is, of course, that this is a very invasive change touching a lot of different files, which will inevitably lead to merge conflicts for many.	2013-10-14 14:20:01 -07:00
Aaron Davidson	4a45019fb0	Address Matei's comments	2013-10-14 00:24:17 -07:00
Joseph E. Gonzalez	bff223454a	trying to address issues with GraphImpl being caught in closures.	2013-10-13 22:27:10 -07:00
Joseph E. Gonzalez	f89e6e5cbf	removing benchmark code	2013-10-13 20:45:01 -07:00
Joseph E. Gonzalez	141c22e28c	merging in master changes	2013-10-13 20:43:23 -07:00
Joseph E. Gonzalez	637b67da56	merging changes from upstream benchmarking branch	2013-10-13 19:54:09 -07:00
Joseph E. Gonzalez	494472a6cc	Integrated IndexedRDD into graph design.	2013-10-13 19:42:32 -07:00
Aaron Davidson	da896115ec	Change BlockId filename to name + rest of Patrick's comments	2013-10-13 11:15:02 -07:00
Aaron Davidson	d60352283c	Add unit test and address rest of Reynold's comments	2013-10-12 22:45:15 -07:00
Aaron Davidson	a395911138	Refactor BlockId into an actual type This is an unfortunately invasive change which converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Since this touches a lot of files, it'd be best to either get this patch in quickly or throw it on the ground to avoid too many secondary merge conflicts.	2013-10-12 22:44:57 -07:00
Reynold Xin	99796904ae	Merge pull request #52 from harveyfeng/hadoop-closure Add an optional closure parameter to HadoopRDD instantiation to use when creating local JobConfs. Having HadoopRDD accept this optional closure eliminates the need for the HadoopFileRDD added earlier. It makes the HadoopRDD more general, in that the caller can specify any JobConf initialization flow.	2013-10-12 21:23:26 -07:00
Harvey Feng	6c32aab87d	Remove the new HadoopRDD constructor from SparkContext API, plus some minor style changes.	2013-10-12 21:02:08 -07:00
Reynold Xin	88866ea9c9	Fixed PairRDDFunctionsSuite after removing InterruptibleRDD.	2013-10-12 20:05:23 -07:00
Reynold Xin	6b288b75d4	Job cancellation: address Matei's code review feedback.	2013-10-12 15:53:31 -07:00
Dan Crankshaw	1a961dd1f2	Fixed connected components CL params	2013-10-12 01:47:38 +00:00
Reynold Xin	ab0940f0c2	Job cancellation: addressed code review feedback round 2 from Kay.	2013-10-11 18:15:04 -07:00
Dan Crankshaw	1e5535cfcf	Added connected components back	2013-10-11 16:38:52 -07:00
Reynold Xin	97ffebbe87	Fixed dagscheduler suite because of a logging message change.	2013-10-11 16:18:22 -07:00
Reynold Xin	dca80094d3	Merge pull request #54 from aoiwelle/remove_unused_imports Remove unnecessary mutable imports It appears that the imports aren't necessary here.	2013-10-11 16:08:15 -07:00
Dan Crankshaw	543a54dffa	Tried to fix some indenting	2013-10-11 16:07:49 -07:00
Reynold Xin	a61cf40ab9	Job cancellation: addressed code review feedback from Kay.	2013-10-11 15:58:14 -07:00
Dan Crankshaw	c4a23f95c3	Updated code so benchmarks actually run.	2013-10-11 22:57:43 +00:00
Matei Zaharia	fb25f32300	Merge pull request #53 from witgo/master Add a zookeeper compile dependency to fix build in maven Add a zookeeper compile dependency to fix build in maven	2013-10-11 15:44:43 -07:00
Matei Zaharia	d6ead47809	Merge pull request #32 from mridulm/master Address review comments, move to incubator spark Also includes a small fix to speculative execution. <edit> Continued from https://github.com/mesos/spark/pull/914 </edit>	2013-10-11 15:43:01 -07:00
Reynold Xin	e2047d3927	Making takeAsync and collectAsync deterministic.	2013-10-11 13:04:45 -07:00
Reynold Xin	09f7609254	Properly handle interrupted exception in FutureAction.	2013-10-11 11:20:15 -07:00
Neal Wiggins	67d4a31f87	Remove unnecessary mutable imports	2013-10-11 09:47:27 -07:00
LiGuoqiang	fc60c412ab	Add a zookeeper compile dependency to fix build in maven	2013-10-11 16:31:47 +08:00
Reynold Xin	42fb1df694	Merge branch 'master' of github.com:apache/incubator-spark into kill Conflicts: core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala	2013-10-10 23:48:05 -07:00
Reynold Xin	d9e724e756	Fixed the broken local scheduler test.	2013-10-10 23:08:13 -07:00
Reynold Xin	37397b73ba	Added comprehensive tests for job cancellation in a variety of environments (local vs cluster, fifo vs fair).	2013-10-10 22:57:43 -07:00
Reynold Xin	80cdbf4f49	Switched to use daemon thread in executor and fixed a bug in job cancellation for fair scheduler.	2013-10-10 22:40:48 -07:00
Matei Zaharia	8f11c36fe1	Merge remote-tracking branch 'tgravescs/sparkYarnDistCache' Closes #11 Conflicts: docs/running-on-yarn.md yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala	2013-10-10 19:34:33 -07:00
Reynold Xin	058508b625	Changed the name of the local cluster executor from local to localhost.	2013-10-10 19:24:00 -07:00
Reynold Xin	ec2e2ed1e1	Use the same Executor in LocalScheduler as in ClusterScheduler.	2013-10-10 18:55:25 -07:00
Matei Zaharia	c71499b779	Merge pull request #19 from aarondav/master-zk Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch `d5a96fe`), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from `d5a96fe`. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.	2013-10-10 17:16:42 -07:00

1 2 3 4 5 ...

4450 commits