ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Prashant Sharma	2573add94c	spark-544, introducing SparkConf and related configuration overhaul.	2013-12-25 00:09:36 +05:30
Patrick Wendell	0bc57c5767	Merge pull request #280 from aarondav/minor Minor cleanup for standalone scheduler See commit messages	2013-12-20 11:56:54 -08:00
Patrick Wendell	eca68d4425	Merge pull request #272 from tmyklebu/master Track and report task result serialisation time. - DirectTaskResult now has a ByteBuffer valueBytes instead of a T value. - DirectTaskResult now has a member function T value() that deserialises valueBytes. - Executor serialises value into a ByteBuffer and passes it to DTR's ctor. - Executor tracks the time taken to do so and puts it in a new field in TaskMetrics. - StagePage now reports serialisation time from TaskMetrics along with the other things it reported.	2013-12-19 18:12:22 -08:00
Aaron Davidson	6613ab663d	Fix compiler warning in SparkZooKeeperSession	2013-12-19 17:56:13 -08:00
Aaron Davidson	4d74b899b7	Remove firstApp from the standalone scheduler Master As a lonely child with no one to care for it... we had to put it down.	2013-12-19 17:53:41 -08:00
Aaron Davidson	1ab031eaff	Extraordinarily minor code/comment cleanup	2013-12-19 17:51:29 -08:00
Reynold Xin	7990c56375	Merge pull request #276 from shivaram/collectPartition Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.	2013-12-19 13:35:09 -08:00
Shivaram Venkataraman	9cc3a6d3c0	Add comment explaining collectPartitions's use	2013-12-19 11:49:17 -08:00
Shivaram Venkataraman	d3234f9726	Make collectPartitions take an array of partitions Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.	2013-12-19 11:40:34 -08:00
Nick Pentreath	a76f53416c	Add toString to Java RDD, and __repr__ to Python RDD	2013-12-19 14:38:20 +02:00
Aaron Davidson	293a0af5a1	In experimental clusters we've observed that a 10 second timeout was insufficient, despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase. There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation. Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe). Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.	2013-12-18 21:42:29 -08:00
Shivaram Venkataraman	af0cd6bd27	Add collectPartition to JavaRDD interface. Also remove takePartition from PythonRDD and use collectPartition in rdd.py.	2013-12-18 11:40:07 -08:00
Tor Myklebust	d3b1af4b6c	Add a serialisation time column to the StagePage.	2013-12-18 14:25:56 -05:00
Tor Myklebust	717c7fddb2	objectSer -> valueSer in a test.	2013-12-17 23:02:21 -05:00
Reynold Xin	9a6864d016	Fixed a performance problem in RDD.top and BoundedPriorityQueue (size in BoundedPriority was actually traversing the entire queue to calculate the size, resulting in bad performance in insertion).	2013-12-17 18:44:39 -08:00
Patrick Wendell	c1fec89895	Cleanup	2013-12-16 21:56:21 -08:00
Patrick Wendell	c6f95e603e	Attempt with extra repositories	2013-12-16 21:53:51 -08:00
Tor Myklebust	b2f0329511	Missed a spot; had an objectSer here too.	2013-12-17 00:18:46 -05:00
Tor Myklebust	25fa976580	Merge branch 'master' of git://github.com/apache/incubator-spark	2013-12-16 23:48:37 -05:00
Tor Myklebust	963d6f065a	Incorporate pwendell's code review suggestions.	2013-12-16 23:14:52 -05:00
Reynold Xin	883e034aeb	Merge pull request #245 from gregakespret/task-maxfailures-fix Fix for spark.task.maxFailures not enforced correctly. Docs at http://spark.incubator.apache.org/docs/latest/configuration.html say: ``` spark.task.maxFailures Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1. ``` Previous implementation worked incorrectly. When for example `spark.task.maxFailures` was set to 1, the job was aborted only after the second task failure, not after the first one.	2013-12-16 14:16:02 -08:00
Tor Myklebust	882d544856	UI to display serialisation time of a stage.	2013-12-16 13:27:03 -05:00
Tor Myklebust	8a397a959b	Track task value serialisation time in TaskMetrics.	2013-12-16 12:07:39 -05:00
Mark Hamstra	09ed7ddfa0	Use scala.binary.version in POMs	2013-12-15 12:39:58 -08:00
Josh Rosen	2fd781d347	Merge pull request #249 from ngbinh/partitionInJavaSortByKey Expose numPartitions parameter in JavaPairRDD.sortByKey() This change makes Java and Scala API on sortByKey() the same.	2013-12-14 12:59:37 -08:00
Prashant Sharma	1ae3c0fc5e	Added a comment about ActorRef and ActorSelection difference.	2013-12-14 10:44:24 +05:30
Prashant Sharma	a854cc536d	Review comments on the PR for scala 2.10 migration.	2013-12-13 15:19:51 +05:30
Prashant Sharma	603af51bb5	Merge branch 'master' into akka-bug-fix Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala	2013-12-11 10:21:53 +05:30
Binh Nguyen	0b494f7db4	Hook directly to Scala API	2013-12-10 11:17:52 -08:00
Binh Nguyen	e85af50767	Leave default value of numPartitions to Scala code.	2013-12-10 11:04:14 -08:00
Grega Kespret	558af87334	Fix tests.	2013-12-10 11:43:42 +01:00
Binh Nguyen	c82d4f079b	Use braces to shorten the line.	2013-12-10 01:04:52 -08:00
Binh Nguyen	5013fb64b2	Expose numPartitions parameter in JavaPairRDD.sortByKey() This change make Java and Scala API on sortByKey() the same.	2013-12-10 00:38:16 -08:00
Prashant Sharma	17db6a9041	Style fixes and addressed review comments at #221	2013-12-10 11:47:16 +05:30
Patrick Wendell	5b74609d97	License headers	2013-12-09 16:41:01 -08:00
Grega Kespret	14a1df6572	Fix for spark.task.maxFailures not enforced correctly.	2013-12-09 10:39:02 +01:00
Prashant Sharma	7ad6921ae0	Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.	2013-12-07 12:45:57 +05:30
Matei Zaharia	e0392343a0	Merge pull request #190 from markhamstra/Stages4Jobs stageId <--> jobId mapping in DAGScheduler Okay, I think this one is ready to go -- or at least it's ready for review and discussion. It's a carry-over of https://github.com/mesos/spark/pull/842 with updates for the newer job cancellation functionality. The prior discussion still applies. I've actually changed the job cancellation flow a bit: Instead of ``cancelTasks`` going to the TaskScheduler and then ``taskSetFailed`` coming back to the DAGScheduler (resulting in ``abortStage`` there), the DAGScheduler now takes care of figuring out which stages should be cancelled, tells the TaskScheduler to cancel tasks for those stages, then does the cleanup within the DAGScheduler directly without the need for any further prompting by the TaskScheduler. I know of three outstanding issues, each of which can and should, I believe, be handled in follow-up pull requests: 1) https://spark-project.atlassian.net/browse/SPARK-960 2) JobLogger should be re-factored to eliminate duplication 3) Related to 2), the WebUI should also become a consumer of the DAGScheduler's new understanding of the relationship between jobs and stages so that it can display progress indication and the like grouped by job. Right now, some of this information is just being sent out as part of ``SparkListenerJobStart`` messages, but more or different job <--> stage information may need to be exported from the DAGScheduler to meet listeners needs. Except for the eventQueue -> Actor commit, the rest can be cherry-picked almost cleanly into branch-0.8. A little merging is needed in MapOutputTracker and the DAGScheduler. Merged versions of those files are in `aba2b40ce0` Note that between the recent Actor change in the DAGScheduler and the cleaning up of DAGScheduler data structures on job completion in this PR, some races have been introduced into the DAGSchedulerSuite. Those tests usually pass, and I don't think that better-behaved code that doesn't directly inspect DAGScheduler data structures should be seeing any problems, but I'll work on fixing DAGSchedulerSuite as either an addition to this PR or as a separate request. UPDATE: Fixed the race that I introduced. Created a JIRA issue (SPARK-965) for the one that was introduced with the switch to eventProcessorActor in the DAGScheduler.	2013-12-06 11:49:59 -08:00
Matei Zaharia	bfa68609d9	Merge pull request #233 from hsaputra/changecontexttobackend Change the name of input argument in ClusterScheduler#initialize from context to backend. The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small change of the input param in the ClusterScheduler#initialize to reflect this.	2013-12-06 11:04:03 -08:00
Matei Zaharia	3fb302c08d	Merge pull request #205 from kayousterhout/logging Added logging of scheduler delays to UI This commit adds two metrics to the UI: 1) The time to get task results, if they're fetched remotely 2) The scheduler delay. When the scheduler starts getting overwhelmed (because it can't keep up with the rate at which tasks are being submitted), the result is that tasks get delayed on the tail-end: the message from the worker saying that the task has completed ends up in a long queue and takes a while to be processed by the scheduler. This commit records that delay in the UI so that users can tell when the scheduler is becoming the bottleneck.	2013-12-06 11:03:32 -08:00
Matei Zaharia	87676a6af2	Merge pull request #220 from rxin/zippart Memoize preferred locations in ZippedPartitionsBaseRDD so preferred location computation doesn't lead to exponential explosion. This was a problem in GraphX where we have a whole chain of RDDs that are ZippedPartitionsRDD's, and the preferred locations were taking eternity to compute. (cherry picked from commit `e36fe55a03`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-12-06 11:01:42 -08:00
Aaron Davidson	94b5881ee9	Fix long lines	2013-12-06 00:22:00 -08:00
Aaron Davidson	5a864e3fce	Rename SparkActorSystem to IndestructibleActorSystem	2013-12-06 00:21:43 -08:00
Prashant Sharma	c9cd2af71e	Merge branch 'wip-scala-2.10' into akka-bug-fix	2013-12-06 13:32:15 +05:30
Mark Hamstra	ee888f6b25	FutureAction result tests	2013-12-05 23:01:18 -08:00
Prashant Sharma	4e70480038	A left over akka -> akka.tcp changes	2013-12-06 12:29:53 +05:30
Henry Saputra	1cb259cb57	Change the name of input ragument in ClusterScheduler#initialize from context to backend. The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small change of the input param in the ClusterScheduler#initialize to reflect this.	2013-12-05 18:50:26 -08:00
Mark Hamstra	aebb123fd3	jobWaiter.synchronized before jobWaiter.wait	2013-12-05 17:16:44 -08:00
Patrick Wendell	5d460253d6	Merge pull request #228 from pwendell/master Document missing configs and set shuffle consolidation to false.	2013-12-05 12:31:24 -08:00
Patrick Wendell	75d161b357	Forcing shuffle consolidation in DiskBlockManagerSuite	2013-12-05 11:36:41 -08:00

1 2 3 4 5 ...

2601 commits