ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Kay Ousterhout	ec512583ab	Removed TaskSchedulerListener interface. The interface was used only by the DAG scheduler (so it wasn't necessary to define the additional interface), and the naming makes it very confusing when reading the code (because "listener" was used to describe the DAG scheduler, rather than SparkListeners, which implement a nearly-identical interface but serve a different function).	2013-10-16 16:57:42 -07:00
Matei Zaharia	f9973cae3a	Merge pull request #65 from tgravescs/fixYarn Fix yarn build Fix the yarn build after renaming StandAloneX to CoarseGrainedX from pull request 34.	2013-10-16 15:58:41 -07:00
tgravescs	cc7df2b3cc	Fix yarn build	2013-10-16 10:09:16 -05:00
Joseph E. Gonzalez	57ac9073ae	Introducing unique indexedrdd and adding numerous specialized joins	2013-10-16 04:08:22 -07:00
Joseph E. Gonzalez	59700c0c2a	switched to more efficienct implementation of reduce by key	2013-10-16 00:18:37 -07:00
Joseph E. Gonzalez	80e4ec3278	IndexedRDD now only supports unique keys	2013-10-16 00:16:44 -07:00
Matei Zaharia	28e9c2abc0	Merge pull request #63 from pwendell/master Fixing spark streaming example and a bug in examples build. - Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example	2013-10-15 23:59:56 -07:00
Matei Zaharia	4e46fde818	Merge pull request #62 from harveyfeng/master Make TaskContext's stageId publicly accessible.	2013-10-15 23:14:27 -07:00
Patrick Wendell	35befe07bb	Fixing spark streaming example and a bug in examples build. - Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example	2013-10-15 22:55:43 -07:00
Harvey Feng	65b46236e7	Proper formatting for SparkHadoopWriter class extensions.	2013-10-15 21:51:52 -07:00
Matei Zaharia	b5346064d6	Merge pull request #8 from vchekan/checkpoint-ttl-restore Serialize and restore spark.cleaner.ttl to savepoint In accordance to conversation in spark-dev maillist, preserve spark.cleaner.ttl parameter when serializing checkpoint.	2013-10-15 21:25:03 -07:00
Matei Zaharia	6dbd2208ff	Merge pull request #34 from kayousterhout/rename Renamed StandaloneX to CoarseGrainedX. (as suggested by @rxin here https://github.com/apache/incubator-spark/pull/14) The previous names were confusing because the components weren't just used in Standalone mode. The scheduler used for Standalone mode is called SparkDeploySchedulerBackend, so referring to the base class as StandaloneSchedulerBackend was misleading.	2013-10-15 19:02:57 -07:00
Matei Zaharia	983b83f24d	Merge pull request #61 from kayousterhout/daemon_thread Unified daemon thread pools As requested by @mateiz in an earlier pull request, this refactors various daemon thread pools to use a set of methods in utils.scala, and also changes the thread-pool-creation methods in utils.scala to use named thread pools for improved debugging.	2013-10-15 19:02:46 -07:00
Joseph E. Gonzalez	3cb6dffce0	adding indexed reduce by key	2013-10-15 18:55:06 -07:00
Harvey Feng	c4c76e37a7	Fix line length > 100 chars in SparkHadoopWriter	2013-10-15 18:35:59 -07:00
Harvey Feng	5b8083fee5	Make TaskContext's stageId publicly accessible.	2013-10-15 18:06:37 -07:00
Joseph E. Gonzalez	9058f261fe	Addressing issue where statistics are not computed correctly	2013-10-15 17:39:09 -07:00
Joseph E. Gonzalez	1b22eef744	Merge branch 'master' of https://github.com/apache/incubator-spark into indexedrdd_graphx	2013-10-15 16:15:19 -07:00
Joseph E. Gonzalez	194bb03d16	Resolved closure capture issues by addressing capture through implicit variables.	2013-10-15 15:10:41 -07:00
Kay Ousterhout	f95a2be045	Fixed build error after merging in master	2013-10-15 14:51:37 -07:00
Kay Ousterhout	acc7638f7c	Merge remote branch 'upstream/master' into rename	2013-10-15 14:43:56 -07:00
Kay Ousterhout	707ad8cc4f	Unified daemon thread pools	2013-10-15 14:23:43 -07:00
Joseph E. Gonzalez	7241cf1632	Updating unit tests.	2013-10-15 14:18:03 -07:00
Matei Zaharia	3249e0e90d	Merge pull request #59 from rxin/warning Bump up logging level to warning for failed tasks.	2013-10-15 14:12:33 -07:00
Joseph E. Gonzalez	345e1e94cc	Still trying to resolve issues with capture.	2013-10-15 14:01:38 -07:00
Joseph E. Gonzalez	b64337ec40	Trying to resolve issues with closure capture.	2013-10-15 13:02:17 -07:00
Reynold Xin	678dec6680	Merge pull request #58 from hsaputra/update-pom-asf Update pom.xml to use version 13 of the ASF parent pom Update pom.xml to use version 13 of the ASF parent pom. Add mailingList element to pom.xml.	2013-10-15 10:51:46 -07:00
Joseph E. Gonzalez	e7d0320000	More refactoring and documentating including renaming data to attr for vertex and edge data and eliminating the vertex type.	2013-10-15 02:20:06 -07:00
KarthikTunga	6c6b146fc2	Merge branch 'master' of https://github.com/apache/incubator-spark Updating local branch	2013-10-15 00:46:35 -07:00
KarthikTunga	d2c86e7188	SPARK-627 - reading --config argument	2013-10-15 00:35:44 -07:00
Reynold Xin	f41feb7b33	Bump up logging level to warning for failed tasks.	2013-10-14 23:35:32 -07:00
Joseph E. Gonzalez	6a13d02319	Merging chagnes for IndexedRDD branch	2013-10-14 23:30:36 -07:00
Joseph E. Gonzalez	6700ccd7d5	Introducing indexedrdd The rest of indexed rdd	2013-10-14 23:30:35 -07:00
Joseph E. Gonzalez	4755f42d78	moving indexedrdd to the correct location	2013-10-14 23:13:27 -07:00
Henry Saputra	3fed3e2283	Update pom.xml to use version 13 of the ASF parent pom and add mailingLists element.	2013-10-14 23:10:54 -07:00
Joseph E. Gonzalez	ef7c369092	merged with upstream changes	2013-10-14 22:56:42 -07:00
Patrick Wendell	e33b1839e2	Merge pull request #29 from rxin/kill Job killing Moving https://github.com/mesos/spark/pull/935 here The high level idea is to have an "interrupted" field in TaskContext, and a task should check that flag to determine if its execution should continue. For convenience, I provide an InterruptibleIterator which wraps around a normal iterator but checks for the interrupted flag. I also provide an InterruptibleRDD that wraps around an existing RDD. As part of this pull request, I added an AsyncRDDActions class that provides a number of RDD actions that return a FutureJob (extending scala.concurrent.Future). The FutureJob can be used to kill the job execution, or waits until the job finishes. This is NOT ready for merging yet. Remaining TODOs: 1. Add unit tests 2. Add job killing functionality for local scheduler (current job killing functionality only works in cluster scheduler) ------------- Update on Oct 10, 2013: This is ready! Related future work: - Figure out how to handle the job triggered by RangePartitioner (this one is tough; might become future work) - Java API - Python API	2013-10-14 22:25:47 -07:00
Reynold Xin	9cd8786e4a	Merge branch 'master' of github.com:apache/incubator-spark into kill Conflicts: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala	2013-10-14 21:51:30 -07:00
Joseph E. Gonzalez	bf059691f0	Adding a few extra comments.	2013-10-14 19:59:11 -07:00
Joseph E. Gonzalez	11a44d0ec9	Introducing indexedrdd The rest of indexed rdd	2013-10-14 19:46:42 -07:00
Joseph E. Gonzalez	67bb39c54b	Removing extraneous code	2013-10-14 18:49:05 -07:00
Reynold Xin	3b11f43e36	Merge pull request #57 from aarondav/bid Refactor BlockId into an actual type Converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Downside is, of course, that this is a very invasive change touching a lot of different files, which will inevitably lead to merge conflicts for many.	2013-10-14 14:20:01 -07:00
Aaron Davidson	4a45019fb0	Address Matei's comments	2013-10-14 00:24:17 -07:00
Joseph E. Gonzalez	bff223454a	trying to address issues with GraphImpl being caught in closures.	2013-10-13 22:27:10 -07:00
Joseph E. Gonzalez	f89e6e5cbf	removing benchmark code	2013-10-13 20:45:01 -07:00
Joseph E. Gonzalez	141c22e28c	merging in master changes	2013-10-13 20:43:23 -07:00
Joseph E. Gonzalez	637b67da56	merging changes from upstream benchmarking branch	2013-10-13 19:54:09 -07:00
Joseph E. Gonzalez	494472a6cc	Integrated IndexedRDD into graph design.	2013-10-13 19:42:32 -07:00
Aaron Davidson	da896115ec	Change BlockId filename to name + rest of Patrick's comments	2013-10-13 11:15:02 -07:00
Aaron Davidson	d60352283c	Add unit test and address rest of Reynold's comments	2013-10-12 22:45:15 -07:00

1 2 3 4 5 ...

4477 commits