ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Shivaram Venkataraman	9cc3a6d3c0	Add comment explaining collectPartitions's use	2013-12-19 11:49:17 -08:00
Shivaram Venkataraman	d3234f9726	Make collectPartitions take an array of partitions Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.	2013-12-19 11:40:34 -08:00
Nick Pentreath	a76f53416c	Add toString to Java RDD, and __repr__ to Python RDD	2013-12-19 14:38:20 +02:00
Tathagata Das	ec71b445ad	Minor changes.	2013-12-18 23:39:28 -08:00
Aaron Davidson	293a0af5a1	In experimental clusters we've observed that a 10 second timeout was insufficient, despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase. There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation. Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe). Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.	2013-12-18 21:42:29 -08:00
Tathagata Das	e93b391d75	Merge branch 'apache-master' into scheduler-update Conflicts: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/ForEachDStream.scala	2013-12-18 17:51:14 -08:00
Tathagata Das	b80ec05635	Added StatsReportListener to generate processing time statistics across multiple batches.	2013-12-18 15:35:24 -08:00
Shivaram Venkataraman	af0cd6bd27	Add collectPartition to JavaRDD interface. Also remove takePartition from PythonRDD and use collectPartition in rdd.py.	2013-12-18 11:40:07 -08:00
Tor Myklebust	d3b1af4b6c	Add a serialisation time column to the StagePage.	2013-12-18 14:25:56 -05:00
Tor Myklebust	717c7fddb2	objectSer -> valueSer in a test.	2013-12-17 23:02:21 -05:00
Reynold Xin	9a6864d016	Fixed a performance problem in RDD.top and BoundedPriorityQueue (size in BoundedPriority was actually traversing the entire queue to calculate the size, resulting in bad performance in insertion).	2013-12-17 18:44:39 -08:00
wangda.tan	59e53fa21c	spark-968, changes for avoid a NPE	2013-12-17 17:57:27 +08:00
wangda.tan	36060f4f50	spark-898, changes according to review comments	2013-12-17 17:55:38 +08:00
Patrick Wendell	c1fec89895	Cleanup	2013-12-16 21:56:21 -08:00
Patrick Wendell	c6f95e603e	Attempt with extra repositories	2013-12-16 21:53:51 -08:00
Tor Myklebust	b2f0329511	Missed a spot; had an objectSer here too.	2013-12-17 00:18:46 -05:00
Tor Myklebust	25fa976580	Merge branch 'master' of git://github.com/apache/incubator-spark	2013-12-16 23:48:37 -05:00
Tor Myklebust	963d6f065a	Incorporate pwendell's code review suggestions.	2013-12-16 23:14:52 -05:00
Reynold Xin	883e034aeb	Merge pull request #245 from gregakespret/task-maxfailures-fix Fix for spark.task.maxFailures not enforced correctly. Docs at http://spark.incubator.apache.org/docs/latest/configuration.html say: ``` spark.task.maxFailures Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1. ``` Previous implementation worked incorrectly. When for example `spark.task.maxFailures` was set to 1, the job was aborted only after the second task failure, not after the first one.	2013-12-16 14:16:02 -08:00
Tor Myklebust	882d544856	UI to display serialisation time of a stage.	2013-12-16 13:27:03 -05:00
Tor Myklebust	8a397a959b	Track task value serialisation time in TaskMetrics.	2013-12-16 12:07:39 -05:00
wangda.tan	8ab8c6a526	Merge branch 'master' of git://github.com/apache/incubator-spark	2013-12-16 21:45:43 +08:00
Mark Hamstra	09ed7ddfa0	Use scala.binary.version in POMs	2013-12-15 12:39:58 -08:00
Josh Rosen	2fd781d347	Merge pull request #249 from ngbinh/partitionInJavaSortByKey Expose numPartitions parameter in JavaPairRDD.sortByKey() This change makes Java and Scala API on sortByKey() the same.	2013-12-14 12:59:37 -08:00
Prashant Sharma	1ae3c0fc5e	Added a comment about ActorRef and ActorSelection difference.	2013-12-14 10:44:24 +05:30
Prashant Sharma	a854cc536d	Review comments on the PR for scala 2.10 migration.	2013-12-13 15:19:51 +05:30
Tathagata Das	097e120c0c	Refactored streaming scheduler and added listener interface. - Refactored Scheduler + JobManager to JobGenerator + JobScheduler and added JobSet for cleaner code. Moved scheduler related code to streaming.scheduler package. - Added StreamingListener trait (similar to SparkListener) to enable gathering to streaming stats like processing times and delays. StreamingContext.addListener() to added listeners. - Deduped some code in streaming tests by modifying TestSuiteBase, and added StreamingListenerSuite.	2013-12-12 20:48:02 -08:00
Prashant Sharma	603af51bb5	Merge branch 'master' into akka-bug-fix Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala	2013-12-11 10:21:53 +05:30
Binh Nguyen	0b494f7db4	Hook directly to Scala API	2013-12-10 11:17:52 -08:00
Binh Nguyen	e85af50767	Leave default value of numPartitions to Scala code.	2013-12-10 11:04:14 -08:00
Grega Kespret	558af87334	Fix tests.	2013-12-10 11:43:42 +01:00
Binh Nguyen	c82d4f079b	Use braces to shorten the line.	2013-12-10 01:04:52 -08:00
Binh Nguyen	5013fb64b2	Expose numPartitions parameter in JavaPairRDD.sortByKey() This change make Java and Scala API on sortByKey() the same.	2013-12-10 00:38:16 -08:00
Prashant Sharma	17db6a9041	Style fixes and addressed review comments at #221	2013-12-10 11:47:16 +05:30
Patrick Wendell	5b74609d97	License headers	2013-12-09 16:41:01 -08:00
Grega Kespret	14a1df6572	Fix for spark.task.maxFailures not enforced correctly.	2013-12-09 10:39:02 +01:00
wangda.tan	ee68a85cff	SPARK-968, added sc finalize code to avoid akka rebinding to the same port	2013-12-09 09:38:58 +08:00
Aaron Davidson	40f63eb034	Merge master into 127	2013-12-08 11:16:52 -08:00
wangda.tan	850c4b709a	Merge branch 'master' of https://github.com/leftnoteasy/incubator-spark-1	2013-12-09 00:12:46 +08:00
wangda.tan	48e4f2ad14	SPARK-968, In stage UI, add an overview section that shows task stats grouped by executor id	2013-12-09 00:02:59 +08:00
Prashant Sharma	7ad6921ae0	Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.	2013-12-07 12:45:57 +05:30
Matei Zaharia	e0392343a0	Merge pull request #190 from markhamstra/Stages4Jobs stageId <--> jobId mapping in DAGScheduler Okay, I think this one is ready to go -- or at least it's ready for review and discussion. It's a carry-over of https://github.com/mesos/spark/pull/842 with updates for the newer job cancellation functionality. The prior discussion still applies. I've actually changed the job cancellation flow a bit: Instead of ``cancelTasks`` going to the TaskScheduler and then ``taskSetFailed`` coming back to the DAGScheduler (resulting in ``abortStage`` there), the DAGScheduler now takes care of figuring out which stages should be cancelled, tells the TaskScheduler to cancel tasks for those stages, then does the cleanup within the DAGScheduler directly without the need for any further prompting by the TaskScheduler. I know of three outstanding issues, each of which can and should, I believe, be handled in follow-up pull requests: 1) https://spark-project.atlassian.net/browse/SPARK-960 2) JobLogger should be re-factored to eliminate duplication 3) Related to 2), the WebUI should also become a consumer of the DAGScheduler's new understanding of the relationship between jobs and stages so that it can display progress indication and the like grouped by job. Right now, some of this information is just being sent out as part of ``SparkListenerJobStart`` messages, but more or different job <--> stage information may need to be exported from the DAGScheduler to meet listeners needs. Except for the eventQueue -> Actor commit, the rest can be cherry-picked almost cleanly into branch-0.8. A little merging is needed in MapOutputTracker and the DAGScheduler. Merged versions of those files are in `aba2b40ce0` Note that between the recent Actor change in the DAGScheduler and the cleaning up of DAGScheduler data structures on job completion in this PR, some races have been introduced into the DAGSchedulerSuite. Those tests usually pass, and I don't think that better-behaved code that doesn't directly inspect DAGScheduler data structures should be seeing any problems, but I'll work on fixing DAGSchedulerSuite as either an addition to this PR or as a separate request. UPDATE: Fixed the race that I introduced. Created a JIRA issue (SPARK-965) for the one that was introduced with the switch to eventProcessorActor in the DAGScheduler.	2013-12-06 11:49:59 -08:00
Matei Zaharia	bfa68609d9	Merge pull request #233 from hsaputra/changecontexttobackend Change the name of input argument in ClusterScheduler#initialize from context to backend. The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small change of the input param in the ClusterScheduler#initialize to reflect this.	2013-12-06 11:04:03 -08:00
Matei Zaharia	3fb302c08d	Merge pull request #205 from kayousterhout/logging Added logging of scheduler delays to UI This commit adds two metrics to the UI: 1) The time to get task results, if they're fetched remotely 2) The scheduler delay. When the scheduler starts getting overwhelmed (because it can't keep up with the rate at which tasks are being submitted), the result is that tasks get delayed on the tail-end: the message from the worker saying that the task has completed ends up in a long queue and takes a while to be processed by the scheduler. This commit records that delay in the UI so that users can tell when the scheduler is becoming the bottleneck.	2013-12-06 11:03:32 -08:00
Matei Zaharia	87676a6af2	Merge pull request #220 from rxin/zippart Memoize preferred locations in ZippedPartitionsBaseRDD so preferred location computation doesn't lead to exponential explosion. This was a problem in GraphX where we have a whole chain of RDDs that are ZippedPartitionsRDD's, and the preferred locations were taking eternity to compute. (cherry picked from commit `e36fe55a03`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-12-06 11:01:42 -08:00
Aaron Davidson	94b5881ee9	Fix long lines	2013-12-06 00:22:00 -08:00
Aaron Davidson	5a864e3fce	Rename SparkActorSystem to IndestructibleActorSystem	2013-12-06 00:21:43 -08:00
Prashant Sharma	c9cd2af71e	Merge branch 'wip-scala-2.10' into akka-bug-fix	2013-12-06 13:32:15 +05:30
Mark Hamstra	ee888f6b25	FutureAction result tests	2013-12-05 23:01:18 -08:00
Prashant Sharma	4e70480038	A left over akka -> akka.tcp changes	2013-12-06 12:29:53 +05:30
Henry Saputra	1cb259cb57	Change the name of input ragument in ClusterScheduler#initialize from context to backend. The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small change of the input param in the ClusterScheduler#initialize to reflect this.	2013-12-05 18:50:26 -08:00
Mark Hamstra	aebb123fd3	jobWaiter.synchronized before jobWaiter.wait	2013-12-05 17:16:44 -08:00
Patrick Wendell	5d460253d6	Merge pull request #228 from pwendell/master Document missing configs and set shuffle consolidation to false.	2013-12-05 12:31:24 -08:00
Patrick Wendell	75d161b357	Forcing shuffle consolidation in DiskBlockManagerSuite	2013-12-05 11:36:41 -08:00
Matei Zaharia	72b696156c	Merge pull request #199 from harveyfeng/yarn-2.2 Hadoop 2.2 migration Includes support for the YARN API stabilized in the Hadoop 2.2 release, and a few style patches. Short description for each set of commits: `a98f5a0` - "Misc style changes in the 'yarn' package" `a67ebf4` - "A few more style fixes in the 'yarn' package" Both of these are some minor style changes, such as fixing lines over 100 chars, to the existing YARN code. `ab8652f` - "Add a 'new-yarn' directory ... " Copies everything from `SPARK_HOME/yarn` to `SPARK_HOME/new-yarn`. No actual code changes here. `4f1c3fa` - "Hadoop 2.2 YARN API migration ..." API patches to code in the `SPARK_HOME/new-yarn` directory. There are a few more small style changes mixed in, too. Based on @colorant's Hadoop 2.2 support for the scala-2.10 branch in #141. `a1a1c62` - "Add optional Hadoop 2.2 settings in sbt build ... " If Spark should be built against Hadoop 2.2, then: a) the `org.apache.spark.deploy.yarn` package will be compiled from the `new-yarn` directory. b) Protobuf v2.5 will be used as a Spark dependency, since Hadoop 2.2 depends on it. Also, Spark will be built against a version of Akka v2.0.5 that's built against Protobuf 2.5, named `akka-2.0.5-protobuf-2.5`. The patched Akka is here: https://github.com/harveyfeng/akka/tree/2.0.5-protobuf-2.5, and was published to local Ivy during testing. There's also a new boolean environment variable, `SPARK_IS_NEW_HADOOP`, that users can manually set if their `SPARK_HADOOP_VERSION` specification does not start with `2.2`, which is how the build file tries to detect a 2.2 version. Not sure if this is necessary or done in the best way, though...	2013-12-04 23:33:04 -08:00
Patrick Wendell	b1c6fa1584	Document missing configs and set shuffle consolidation to false.	2013-12-04 18:39:34 -08:00
Patrick Wendell	182f9baeed	Merge pull request #227 from pwendell/master Fix small bug in web UI and minor clean-up. There was a bug where sorting order didn't work correctly for write time metrics. I also cleaned up some earlier code that fixed the same issue for read and write bytes.	2013-12-04 15:52:07 -08:00
Patrick Wendell	380b90b9b3	Fix small bug in web UI and minor clean-up. There was a bug where sorting order didn't work correctly for write time metrics. I also cleaned up some earlier code that fixed the same issue for read and write bytes.	2013-12-04 14:41:48 -08:00
Andrew Ash	217611680d	Add missing space after "Serialized" in StorageLevel Current code creates outputs like: scala> res0.getStorageLevel.description res2: String = Serialized1x Replicated	2013-12-04 11:29:20 -08:00
Matei Zaharia	d6e5473872	Merge pull request #223 from rxin/transient Mark partitioner, name, and generator field in RDD as @transient. As part of the effort to reduce serialized task size.	2013-12-04 10:28:50 -08:00
Reynold Xin	974a69d79c	Marked doCheckpointCalled as transient.	2013-12-03 11:34:38 -08:00
Mark Hamstra	403234dd0d	SparkListenerJobStart posted from local jobs	2013-12-03 09:57:32 -08:00
Mark Hamstra	f55d0b935d	Synchronous, inline cleanup after runLocally	2013-12-03 09:57:32 -08:00
Mark Hamstra	c9fcd909d0	Local jobs post SparkListenerJobEnd, and DAGScheduler data structure cleanup always occurs before any posting of SparkListenerJobEnd.	2013-12-03 09:57:32 -08:00
Mark Hamstra	9ae2d094a9	Tightly couple stageIdToJobIds and jobIdToStageIds	2013-12-03 09:57:32 -08:00
Mark Hamstra	27c45e5236	Cleaned up job cancellation handling	2013-12-03 09:57:32 -08:00
Mark Hamstra	686a420ddc	Refactoring to make job removal, stage removal, task cancellation clearer	2013-12-03 09:57:32 -08:00
Mark Hamstra	205566e56e	Improved comment	2013-12-03 09:57:32 -08:00
Mark Hamstra	94087c463b	Removed redundant residual re: reverted refactoring.	2013-12-03 09:57:31 -08:00
Mark Hamstra	982797dcba	Fixed intended side-effects	2013-12-03 09:57:31 -08:00
Mark Hamstra	6f8359b5ad	Actor instead of eventQueue for LocalJobCompleted	2013-12-03 09:57:31 -08:00
Mark Hamstra	51458ab4a1	Added stageId <--> jobId mapping in DAGScheduler ...and make sure that DAGScheduler data structures are cleaned up on job completion. Initial effort and discussion at https://github.com/mesos/spark/pull/842	2013-12-03 09:57:31 -08:00
Raymond Liu	4738818dd6	Fix pom.xml for maven build	2013-12-03 16:36:05 +08:00
Reynold Xin	58d9bbcfec	Merge pull request #217 from aarondav/mesos-urls Re-enable zk:// urls for Mesos SparkContexts This was broken in PR #71 when we explicitly disallow anything that didn't fit a mesos:// url. Although it is not really clear that a zk:// url should match Mesos, it is what the docs say and it is necessary for backwards compatibility. Additionally added a unit test for the creation of all types of TaskSchedulers. Since YARN and Mesos are not necessarily available in the system, they are allowed to pass as long as the YARN/Mesos code paths are exercised.	2013-12-02 21:58:53 -08:00
Prashant Sharma	09e8be9a62	Made running SparkActorSystem specific to executors only.	2013-12-03 11:27:45 +05:30
Aaron Davidson	0f24576c08	Cleanup and documentation of SparkActorSystem	2013-12-03 11:05:12 +05:30
Reynold Xin	e34b4693d3	Mark partitioner, name, and generator field in RDD as @transient.	2013-12-02 21:24:44 -08:00
Kay Ousterhout	58b3aff9a8	Fixed problem with scheduler delay	2013-12-02 20:30:03 -08:00
Aaron Davidson	f6c8c1c7b6	Cleanup and documentation of SparkActorSystem	2013-12-02 11:42:53 -08:00
Prashant Sharma	5b11028a04	Made akka capable of tolerating fatal exceptions and moving on.	2013-12-02 10:47:39 +05:30
Reynold Xin	740922f25d	Merge pull request #219 from sundeepn/schedulerexception Scheduler quits when newStage fails The current scheduler thread does not handle exceptions from newStage stage while launching new jobs. The thread fails on any exception that gets triggered at that level, leaving the cluster hanging with no schduler.	2013-12-01 12:46:58 -08:00
Sundeep Narravula	be3ea2394f	Log exception in scheduler in addition to passing it to the caller. Code Styling changes.	2013-12-01 00:50:34 -08:00
Reynold Xin	9cf7f31e4d	Memoize preferred locations in ZippedPartitionsBaseRDD so preferred location computation doesn't lead to exponential explosion. (cherry picked from commit `e36fe55a03`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-11-30 18:10:52 -08:00
Sundeep Narravula	4d53830eb7	Scheduler quits when createStage fails. The current scheduler thread does not handle exceptions from createStage stage while launching new jobs. The thread fails on any exception that gets triggered at that level, leaving the cluster hanging with no schduler.	2013-11-30 16:18:12 -08:00
Aaron Davidson	96df26be47	Add spaces between tests	2013-11-29 13:20:43 -08:00
Prashant Sharma	5618af6803	Merge branch 'master' into wip-scala-2.10	2013-11-29 13:41:21 +05:30
Prashant Sharma	1bc83ca791	Changed defaults for akka to almost disable failure detector.	2013-11-29 13:41:05 +05:30
Lian, Cheng	4a1d966e26	More comments	2013-11-29 16:02:58 +08:00
Lian, Cheng	1e25086009	Updated some inline comments in DAGScheduler	2013-11-29 15:56:47 +08:00
Aaron Davidson	081a0b6861	Add unit test for SparkContext scheduler creation Since YARN and Mesos are not necessarily available in the system, they are allowed to pass as long as the YARN/Mesos code paths are exercised.	2013-11-28 20:40:57 -08:00
Aaron Davidson	37f161cf6b	Re-enable zk:// urls for Mesos SparkContexts This was broken in PR #71 when we explicitly disallow anything that didn't fit a mesos:// url. Although it is not really clear that a zk:// url should match Mesos, it is what the docs say and it is necessary for backwards compatibility.	2013-11-28 20:37:56 -08:00
Lian, Cheng	18def5d6f2	Bugfix: SPARK-965 & SPARK-966 SPARK-965: https://spark-project.atlassian.net/browse/SPARK-965 SPARK-966: https://spark-project.atlassian.net/browse/SPARK-966 * Add back DAGScheduler.start(), eventProcessActor is created and started here. Notice that function is only called by SparkContext. * Cancel the scheduled stage resubmission task when stopping eventProcessActor * Add a new DAGSchedulerEvent ResubmitFailedStages This event message is sent by the scheduled stage resubmission task to eventProcessActor. In this way, DAGScheduler.resubmitFailedStages is guaranteed to be executed from the same thread that runs DAGScheduler.processEvent. Please refer to discussion in SPARK-966 for details.	2013-11-28 17:46:06 +08:00
Prashant Sharma	3ec5d74766	Fixed the broken build.	2013-11-28 13:02:28 +05:30
Matei Zaharia	743a31a7ca	Merge pull request #210 from haitaoyao/http-timeout add http timeout for httpbroadcast While pulling task bytecode from HttpBroadcast server, there's no timeout value set. This may cause spark executor code hang and other task in the same executor process wait for the lock. I have encountered the issue in my cluster. Here's the stacktrace I captured : https://gist.github.com/haitaoyao/7655830 So add a time out value to ensure the task fail fast.	2013-11-27 18:24:39 -08:00
Prashant Sharma	17987778da	Merge branch 'master' into wip-scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py	2013-11-27 14:44:12 +05:30
Prashant Sharma	54862af5ee	Improvements from the review comments and followed Boy Scout Rule.	2013-11-27 14:26:28 +05:30
Matei Zaharia	fb6875dd5c	Merge pull request #146 from JoshRosen/pyspark-custom-serializers Custom Serializers for PySpark This pull request adds support for custom serializers to PySpark. For now, all Python-transformed (or parallelize()d RDDs) are serialized with the same serializer that's specified when creating SparkContext. For now, PySpark includes `PickleSerDe` and `MarshalSerDe` classes for using Python's `pickle` and `marshal` serializers. It's pretty easy to add support for other serializers, although I still need to add instructions on this. A few notable changes: - The Scala `PythonRDD` class no longer manipulates Pickled objects; data from `textFile` is written to Python as MUTF-8 strings. The Python code performs the appropriate bookkeeping to track which deserializer should be used when reading an underlying JavaRDD. This mechanism could also be used to support other data exchange formats, such as MsgPack. - Several magic numbers were refactored into constants. - Batching is implemented by wrapping / decorating an unbatched SerDe.	2013-11-26 20:55:40 -08:00
Matei Zaharia	330ada1766	Merge pull request #207 from henrydavidge/master Log a warning if a task's serialized size is very big As per Reynold's instructions, we now create a warning level log entry if a task's serialized size is too big. "Too big" is currently defined as 100kb. This warning message is generated at most once for each stage.	2013-11-26 19:08:33 -08:00
Harvey Feng	afe4fe7f5e	Merge remote-tracking branch 'origin/master' into yarn-2.2 Conflicts: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala	2013-11-26 15:03:03 -08:00
hhd	57579934f0	Emit warning when task size > 100KB	2013-11-26 16:58:39 -05:00

1 2 3 4 5 ...

2666 commits