Commit graph

4987 commits

Author SHA1 Message Date
Patrick Wendell 75d161b357 Forcing shuffle consolidation in DiskBlockManagerSuite 2013-12-05 11:36:41 -08:00
Matei Zaharia 72b696156c Merge pull request #199 from harveyfeng/yarn-2.2
Hadoop 2.2 migration

Includes support for the YARN API stabilized in the Hadoop 2.2 release, and a few style patches.

Short description for each set of commits:

a98f5a0 - "Misc style changes in the 'yarn' package"
a67ebf4 - "A few more style fixes in the 'yarn' package"
Both of these are some minor style changes, such as fixing lines over 100 chars, to the existing YARN code.

ab8652f - "Add a 'new-yarn' directory ... "
Copies everything from `SPARK_HOME/yarn` to `SPARK_HOME/new-yarn`. No actual code changes here.

4f1c3fa - "Hadoop 2.2 YARN API migration ..."
API patches to code in the `SPARK_HOME/new-yarn` directory. There are a few more small style changes mixed in, too.
Based on @colorant's Hadoop 2.2 support for the scala-2.10 branch in #141.

a1a1c62 - "Add optional Hadoop 2.2 settings in sbt build ... "
If Spark should be built against Hadoop 2.2, then:
a) the `org.apache.spark.deploy.yarn` package will be compiled from the `new-yarn` directory.
b) Protobuf v2.5 will be used as a Spark dependency, since Hadoop 2.2 depends on it. Also, Spark will be built against a version of Akka v2.0.5 that's built against Protobuf 2.5, named `akka-2.0.5-protobuf-2.5`. The patched Akka is here: https://github.com/harveyfeng/akka/tree/2.0.5-protobuf-2.5, and was published to local Ivy during testing.

There's also a new boolean environment variable, `SPARK_IS_NEW_HADOOP`, that users can manually set if their `SPARK_HADOOP_VERSION` specification does not start with `2.2`, which is how the build file tries to detect a 2.2 version. Not sure if this is necessary or done in the best way, though...
2013-12-04 23:33:04 -08:00
Patrick Wendell 1450b8ef87 Small changes from Matei review 2013-12-04 18:49:32 -08:00
Patrick Wendell b1c6fa1584 Document missing configs and set shuffle consolidation to false. 2013-12-04 18:39:34 -08:00
Patrick Wendell 182f9baeed Merge pull request #227 from pwendell/master
Fix small bug in web UI and minor clean-up.

There was a bug where sorting order didn't work correctly for write time metrics.

I also cleaned up some earlier code that fixed the same issue for read and
write bytes.
2013-12-04 15:52:07 -08:00
Reynold Xin b9e7609f2c Merge pull request #225 from ash211/patch-3
Add missing space after "Serialized" in StorageLevel

Current code creates outputs like:

scala> res0.getStorageLevel.description
res2: String = Serialized1x Replicated
2013-12-04 14:42:09 -08:00
Patrick Wendell 380b90b9b3 Fix small bug in web UI and minor clean-up.
There was a bug where sorting order didn't work correctly for write time metrics.

I also cleaned up some earlier code that fixed the same issue for read and
write bytes.
2013-12-04 14:41:48 -08:00
Reynold Xin 055462c1d3 Merge pull request #226 from ash211/patch-4
Typo: applicaton
2013-12-04 14:02:11 -08:00
Andrew Ash 0c5af38b86 Typo: applicaton 2013-12-04 12:30:25 -08:00
Andrew Ash 217611680d Add missing space after "Serialized" in StorageLevel
Current code creates outputs like:

scala> res0.getStorageLevel.description
res2: String = Serialized1x Replicated
2013-12-04 11:29:20 -08:00
Matei Zaharia d6e5473872 Merge pull request #223 from rxin/transient
Mark partitioner, name, and generator field in RDD as @transient.

As part of the effort to reduce serialized task size.
2013-12-04 10:28:50 -08:00
Reynold Xin 8a3475aed6 Merge pull request #218 from JoshRosen/spark-970-pyspark-unicode-error
Fix UnicodeEncodeError in PySpark saveAsTextFile() (SPARK-970)

This fixes [SPARK-970](https://spark-project.atlassian.net/browse/SPARK-970), an issue where PySpark's saveAsTextFile() could throw UnicodeEncodeError when called on an RDD of Unicode strings.

Please merge this into master and branch-0.8.
2013-12-03 14:21:40 -08:00
Reynold Xin 974a69d79c Marked doCheckpointCalled as transient. 2013-12-03 11:34:38 -08:00
Mark Hamstra 403234dd0d SparkListenerJobStart posted from local jobs 2013-12-03 09:57:32 -08:00
Mark Hamstra f55d0b935d Synchronous, inline cleanup after runLocally 2013-12-03 09:57:32 -08:00
Mark Hamstra c9fcd909d0 Local jobs post SparkListenerJobEnd, and DAGScheduler data structure
cleanup always occurs before any posting of SparkListenerJobEnd.
2013-12-03 09:57:32 -08:00
Mark Hamstra 9ae2d094a9 Tightly couple stageIdToJobIds and jobIdToStageIds 2013-12-03 09:57:32 -08:00
Mark Hamstra 27c45e5236 Cleaned up job cancellation handling 2013-12-03 09:57:32 -08:00
Mark Hamstra 686a420ddc Refactoring to make job removal, stage removal, task cancellation clearer 2013-12-03 09:57:32 -08:00
Mark Hamstra 205566e56e Improved comment 2013-12-03 09:57:32 -08:00
Mark Hamstra 94087c463b Removed redundant residual re: reverted refactoring. 2013-12-03 09:57:31 -08:00
Mark Hamstra 982797dcba Fixed intended side-effects 2013-12-03 09:57:31 -08:00
Mark Hamstra 6f8359b5ad Actor instead of eventQueue for LocalJobCompleted 2013-12-03 09:57:31 -08:00
Mark Hamstra 51458ab4a1 Added stageId <--> jobId mapping in DAGScheduler
...and make sure that DAGScheduler data structures are cleaned up on job completion.
  Initial effort and discussion at https://github.com/mesos/spark/pull/842
2013-12-03 09:57:31 -08:00
Harvey Feng 46b87b8a25 Merge pull request #2 from colorant/yarn-client-2.2
Fix pom.xml for maven build
2013-12-03 00:41:11 -08:00
Raymond Liu 4738818dd6 Fix pom.xml for maven build 2013-12-03 16:36:05 +08:00
Harvey Feng 1b6e450771 Use published "org.spark-project.akka-*" in sbt build for Hadoop-2.2 dependencies.
This also includes:
-Change `isNewYarn` to `isNewHadoop`, since the protobuf-2.5 dependency is from Hadoop-2.2 itself.
-Regexp bugix

Credits to @alig for this patch.
2013-12-03 00:28:33 -08:00
Reynold Xin 58d9bbcfec Merge pull request #217 from aarondav/mesos-urls
Re-enable zk:// urls for Mesos SparkContexts

This was broken in PR #71 when we explicitly disallow anything that didn't fit a mesos:// url.
Although it is not really clear that a zk:// url should match Mesos, it is what the docs say and it is necessary for backwards compatibility.

Additionally added a unit test for the creation of all types of TaskSchedulers. Since YARN and Mesos are not necessarily available in the system, they are allowed to pass as long as the YARN/Mesos code paths are exercised.
2013-12-02 21:58:53 -08:00
Prashant Sharma 09e8be9a62 Made running SparkActorSystem specific to executors only. 2013-12-03 11:27:45 +05:30
Aaron Davidson 0f24576c08 Cleanup and documentation of SparkActorSystem 2013-12-03 11:05:12 +05:30
Reynold Xin e34b4693d3 Mark partitioner, name, and generator field in RDD as @transient. 2013-12-02 21:24:44 -08:00
Kay Ousterhout 58b3aff9a8 Fixed problem with scheduler delay 2013-12-02 20:30:03 -08:00
Aaron Davidson f6c8c1c7b6 Cleanup and documentation of SparkActorSystem 2013-12-02 11:42:53 -08:00
Prashant Sharma 5b11028a04 Made akka capable of tolerating fatal exceptions and moving on. 2013-12-02 10:47:39 +05:30
Reynold Xin 740922f25d Merge pull request #219 from sundeepn/schedulerexception
Scheduler quits when newStage fails

The current scheduler thread does not handle exceptions from newStage stage while launching new jobs. The thread fails on any exception that gets triggered at that level, leaving the cluster hanging with no schduler.
2013-12-01 12:46:58 -08:00
Sundeep Narravula be3ea2394f Log exception in scheduler in addition to passing it to the caller.
Code Styling changes.
2013-12-01 00:50:34 -08:00
Reynold Xin 60e23a58b2 Merge pull request #216 from liancheng/fix-spark-966
Bugfix: SPARK-965 & SPARK-966

SPARK-965: https://spark-project.atlassian.net/browse/SPARK-965
SPARK-966: https://spark-project.atlassian.net/browse/SPARK-966

* Add back `DAGScheduler.start()`, `eventProcessActor` is created and started here.

  Notice that function is only called by `SparkContext`.

* Cancel the scheduled stage resubmission task when stopping `eventProcessActor`

* Add a new `DAGSchedulerEvent` `ResubmitFailedStages`

  This event message is sent by the scheduled stage resubmission task to `eventProcessActor`.  In this way, `DAGScheduler.resubmitFailedStages()` is guaranteed to be executed from the same thread that runs `DAGScheduler.processEvent()`.

  Please refer to discussion in [SPARK-966](https://spark-project.atlassian.net/browse/SPARK-966) for details.
2013-11-30 23:38:49 -08:00
Reynold Xin 9cf7f31e4d Memoize preferred locations in ZippedPartitionsBaseRDD so preferred location computation doesn't lead to exponential explosion.
(cherry picked from commit e36fe55a03)
Signed-off-by: Reynold Xin <rxin@apache.org>
2013-11-30 18:10:52 -08:00
Sundeep Narravula 4d53830eb7 Scheduler quits when createStage fails.
The current scheduler thread does not handle exceptions from createStage stage while launching new jobs. The thread fails on any exception that gets triggered at that level, leaving the cluster hanging with no schduler.
2013-11-30 16:18:12 -08:00
Aaron Davidson 96df26be47 Add spaces between tests 2013-11-29 13:20:43 -08:00
Prashant Sharma 5618af6803 Merge branch 'master' into wip-scala-2.10 2013-11-29 13:41:21 +05:30
Prashant Sharma 1bc83ca791 Changed defaults for akka to almost disable failure detector. 2013-11-29 13:41:05 +05:30
Lian, Cheng 4a1d966e26 More comments 2013-11-29 16:02:58 +08:00
Lian, Cheng 1e25086009 Updated some inline comments in DAGScheduler 2013-11-29 15:56:47 +08:00
Josh Rosen 3787f514d9 Fix UnicodeEncodeError in PySpark saveAsTextFile().
Fixes SPARK-970.
2013-11-28 23:44:56 -08:00
Aaron Davidson 081a0b6861 Add unit test for SparkContext scheduler creation
Since YARN and Mesos are not necessarily available in the system,
they are allowed to pass as long as the YARN/Mesos code paths are
exercised.
2013-11-28 20:40:57 -08:00
Aaron Davidson 37f161cf6b Re-enable zk:// urls for Mesos SparkContexts
This was broken in PR #71 when we explicitly disallow anything that
didn't fit a mesos:// url.

Although it is not really clear that a zk:// url should match Mesos,
it is what the docs say and it is necessary for backwards compatibility.
2013-11-28 20:37:56 -08:00
Lian, Cheng 18def5d6f2 Bugfix: SPARK-965 & SPARK-966
SPARK-965: https://spark-project.atlassian.net/browse/SPARK-965
SPARK-966: https://spark-project.atlassian.net/browse/SPARK-966

* Add back DAGScheduler.start(), eventProcessActor is created and started here.

  Notice that function is only called by SparkContext.

* Cancel the scheduled stage resubmission task when stopping eventProcessActor

* Add a new DAGSchedulerEvent ResubmitFailedStages

  This event message is sent by the scheduled stage resubmission task to eventProcessActor.  In this way, DAGScheduler.resubmitFailedStages is guaranteed to be executed from the same thread that runs DAGScheduler.processEvent.

  Please refer to discussion in SPARK-966 for details.
2013-11-28 17:46:06 +08:00
Prashant Sharma 3ec5d74766 Fixed the broken build. 2013-11-28 13:02:28 +05:30
Matei Zaharia 743a31a7ca Merge pull request #210 from haitaoyao/http-timeout
add http timeout for httpbroadcast

While pulling task bytecode from HttpBroadcast server, there's no timeout value set. This may cause spark executor code hang and other task in the same executor process wait for the lock. I have encountered the issue in my cluster. Here's the stacktrace I captured  : https://gist.github.com/haitaoyao/7655830

So add a time out value to ensure the task fail fast.
2013-11-27 18:24:39 -08:00