Commit graph

2539 commits

Author SHA1 Message Date
Patrick Wendell b313e15616 Fix UI bug introduced in #244.
The 'duration' field was incorrectly renamed to 'task time' in the table that
lists stages.
2014-01-11 10:52:57 -08:00
Thomas Graves 7cef8435d7 Merge pull request #371 from tgravescs/yarn_client_addjar_misc_fixes
Yarn client addjar and misc fixes

Fix the addJar functionality in yarn-client mode, add support for the other options supported in yarn-standalone mode, set the application type on yarn in hadoop 2.X, add documentation, change heartbeat interval to be same code as the yarn-standalone so it doesn't take so long to get containers and exit.
2014-01-10 15:34:15 -06:00
Patrick Wendell 7b58f116e5 Merge pull request #384 from pwendell/debug-logs
Make DEBUG-level logs consummable.

Removes two things that caused issues with the debug logs:

(a) Internal polling in the DAGScheduler was polluting the logs.
(b) The Scala REPL logs were really noisy.
2014-01-10 12:47:46 -08:00
Patrick Wendell e9ed2d9e82 Make DEBUG-level logs consummable.
Removes two things that caused issues with the debug logs:

(a) Internal polling in the DAGScheduler was polluting the logs.
(b) The Scala REPL logs were really noisy.
2014-01-10 10:33:24 -08:00
Matei Zaharia 0ebc97305a Merge pull request #375 from mateiz/option-fix
Fix bug added when we changed AppDescription.maxCores to an Option

The Scala compiler warned about this -- we were comparing an Option against an integer now.
2014-01-09 23:58:49 -08:00
Patrick Wendell 460f655cc6 Enable shuffle consolidation by default.
Bump this to being enabled for 0.9.0.
2014-01-09 22:42:50 -08:00
Reynold Xin 4b074fac05 Merge pull request #374 from mateiz/completeness
Add some missing Java API methods

These are primarily for setting job groups, canceling jobs, and setting names on RDDs. Seemed like useful stuff to expose in Java.
2014-01-09 19:03:55 -08:00
Reynold Xin a9d533333d Merge pull request #294 from RongGu/master
Bug fixes for updating the RDD block's memory and disk usage information

Bug fixes for updating the RDD block's memory and disk usage information.
From the code context, we can find that the memSize and diskSize here are both always equal to the size of the block. Actually, they never be zero. Thus, the logic here is wrong for recording the block usage in BlockStatus, especially for the blocks which are dropped from memory to ensure space for the new input rdd blocks. I have tested it that this would cause the storage metrics shown in the Storage webpage wrong and misleading. With this patch, the metrics will be okay.
 Finally, Merry Christmas, guys:)
2014-01-09 18:46:46 -08:00
Patrick Wendell d86a85e9ca Merge pull request #293 from pwendell/standalone-driver
SPARK-998: Support Launching Driver Inside of Standalone Mode

[NOTE: I need to bring the tests up to date with new changes, so for now they will fail]

This patch provides support for launching driver programs inside of a standalone cluster manager. It also supports monitoring and re-launching of driver programs which is useful for long running, recoverable applications such as Spark Streaming jobs. For those jobs, this patch allows a deployment mode which is resilient to the failure of any worker node, failure of a master node (provided a multi-master setup), and even failures of the applicaiton itself, provided they are recoverable on a restart. Driver information, such as the status and logs from a driver, is displayed in the UI

There are a few small TODO's here, but the code is generally feature-complete. They are:
- Bring tests up to date and add test coverage
- Restarting on failure should be optional and maybe off by default.
- See if we can re-use akka connections to facilitate clients behind a firewall

A sensible place to start for review would be to look at the `DriverClient` class which presents users the ability to launch their driver program. I've also added an example program (`DriverSubmissionTest`) that allows you to test this locally and play around with killing workers, etc. Most of the code is devoted to persisting driver state in the cluster manger, exposing it in the UI, and dealing correctly with various types of failures.

Instructions to test locally:
- `sbt/sbt assembly/assembly examples/assembly`
- start a local version of the standalone cluster manager

```
./spark-class org.apache.spark.deploy.client.DriverClient \
  -j -Dspark.test.property=something \
  -e SPARK_TEST_KEY=SOMEVALUE \
  launch spark://10.99.1.14:7077 \
  ../path-to-examples-assembly-jar \
  org.apache.spark.examples.DriverSubmissionTest 1000 some extra options --some-option-here -X 13
```
- Go in the UI and make sure it started correctly, look at the output etc
- Kill workers, the driver program, masters, etc.
2014-01-09 18:37:52 -08:00
Matei Zaharia c43eb00644 Fix bug added when we changed AppDescription.maxCores to an Option
The Scala compiler warned about this -- we were comparing an Option
against an integer now.
2014-01-09 18:14:20 -08:00
Matei Zaharia 142921c6c0 Add some missing Java API methods 2014-01-09 18:11:12 -08:00
Patrick Wendell 26cdb5f68a Merge pull request #372 from pwendell/log4j-fix-1
Send logs to stderr by default (instead of stdout).
2014-01-09 17:16:34 -08:00
Patrick Wendell 2af98198ad Send logs to stderr by default (instead of stdout). 2014-01-09 15:57:44 -08:00
Matei Zaharia 12f414ed43 Merge pull request #362 from mateiz/conf-getters
Use typed getters for configuration settings

This improves some of the code style after SPARK-544.
2014-01-09 15:31:30 -08:00
Patrick Wendell 67b9a33628 Some usability improvements 2014-01-09 12:42:37 -08:00
Thomas Graves c617083e47 yarn-client addJar fix and misc other 2014-01-09 10:24:35 -06:00
Reynold Xin 365cac9465 Merge pull request #361 from rxin/clean
Minor style cleanup. Mostly on indenting & line width changes.

Focused on the few important files since they are the files that new contributors usually read first.
2014-01-09 00:56:16 -08:00
Reynold Xin 295d82583a Minor update on SparkContext.broadcast's JavaDoc. 2014-01-09 00:30:22 -08:00
Matei Zaharia a01f3401e3 Use typed getters for configuration settings 2014-01-09 00:07:29 -08:00
Patrick Wendell 112c0a1776 Fixing config option "retained_stages" => "retainedStages".
This is a very esoteric option and it's out of sync with the style we use.
So it seems fitting to fix it for 0.9.0.
2014-01-08 21:16:16 -08:00
Patrick Wendell 0f9d2ace6b Adding polling to driver submission client. 2014-01-08 16:56:26 -08:00
Reynold Xin 46f6a3b6aa Minor style cleanup. Mostly on indenting & line width changes. 2014-01-08 14:55:04 -08:00
Reynold Xin 56ebfeaa52 Merge pull request #357 from hsaputra/set_boolean_paramname
Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID

Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID to make it clear which param being set.
2014-01-08 11:50:06 -08:00
Reynold Xin 5cae05f59e Merge pull request #356 from hsaputra/remove_deprecated_cleanup_method
Remove calls to deprecated mapred's OutputCommitter.cleanupJob

Since Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job via call to OutputCommitter.cleanupJob,

Remove SparkHadoopWriter.cleanup since it is used only by PairRDDFunctions.

In fact the implementation of mapred OutputCommitter.commitJob looks like this:

  public void commitJob(JobContext jobContext) throws IOException {
    cleanupJob(jobContext);
  }
2014-01-08 11:47:28 -08:00
walker d942f95d7e Merge remote branch 'upstream/master' 2014-01-09 01:22:26 +08:00
Patrick Wendell bc81ce040d Merge remote-tracking branch 'apache-github/master' into standalone-driver
Conflicts:
	core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala
	pom.xml
2014-01-08 00:38:31 -08:00
Henry Saputra aa56585d21 Resolve PR review over 100 chars 2014-01-08 00:38:29 -08:00
Patrick Wendell 3ec21f2eee Show more helpful information in UI 2014-01-08 00:30:10 -08:00
Patrick Wendell c78b381e91 Fixes 2014-01-08 00:09:12 -08:00
Patrick Wendell d0533f7046 Rename to Client 2014-01-07 23:38:51 -08:00
Patrick Wendell 3d939e5fe8 Adding --verbose option to DriverClient 2014-01-07 23:27:18 -08:00
Henry Saputra f6b6f88367 Set boolean param name for two files call to SparkHadoopMapReduceUtil.newTaskAttemptID to make
it clear which param being set.
2014-01-07 23:23:17 -08:00
Henry Saputra 4517326ec6 Remove calls to deprecated mapred's OutputCommitter.cleanupJob because since Hadoop 1.0.4
the mapred OutputCommitter.commitJob should do cleanup job.

In fact the implementation of mapred OutputCommitter.commitJob looks like this:

  public void commitJob(JobContext jobContext) throws IOException {
    cleanupJob(jobContext);
  }

(The jobContext input argument is type of org.apache.hadoop.mapred.JobContext)
2014-01-07 22:55:56 -08:00
Patrick Wendell f5f12dc282 Merge pull request #336 from liancheng/akka-remote-lookup
Get rid of `Either[ActorRef, ActorSelection]'

In this pull request, instead of returning an `Either[ActorRef, ActorSelection]`, `registerOrLookup` identifies the remote actor blockingly to obtain an `ActorRef`, or throws an exception if the remote actor doesn't exist or the lookup times out (configured by `spark.akka.lookupTimeout`).  This function is only called when an `SparkEnv` is constructed (instantiating driver or executor), so the blocking call is considered acceptable.  Executor side `ActorSelection`s/`ActorRef`s to driver side `MapOutputTrackerMasterActor` and `BlockManagerMasterActor` are affected by this pull request.

`ActorSelection` is dangerous and should be used with care.  It's only absolutely safe to send messages via an `ActorSelection` when the remote actor is stateless, so that actor incarnation is irrelevant.  But as pointed by @ScrapCodes in the comments below, executor exits immediately once the connection to the driver lost, `ActorSelection`s are not harmful in this scenario.  So this pull request is mostly a code style patch.
2014-01-07 21:56:35 -08:00
Matei Zaharia d75dc428da Merge pull request #350 from mateiz/standalone-limit
Add way to limit default # of cores used by apps in standalone mode

Also documents the spark.deploy.spreadOut option, and fixes a config option that had a dash in its name.
2014-01-08 00:30:03 -05:00
Matei Zaharia 2c421749ea Address review comments 2014-01-07 19:30:23 -05:00
Patrick Wendell e21a707a13 Adding unit tests and some refactoring to promote testability. 2014-01-07 15:39:47 -08:00
Patrick Wendell e688e11206 Add log4j exclusion rule to maven.
To make this work I had to rename the defaults file. Otherwise
maven's pattern matching rules included it when trying to match
other log4j.properties files.

I also fixed a bug in the existing maven build where two
<transformers> tags were present in assembly/pom.xml
such that one overwrote the other.
2014-01-07 12:56:24 -08:00
Matei Zaharia d8bcc8e9a0 Add way to limit default # of cores used by applications on standalone mode
Also documents the spark.deploy.spreadOut option.
2014-01-07 14:35:52 -05:00
Reynold Xin 15d9534501 Merge pull request #318 from srowen/master
Suggested small changes to Java code for slightly more standard style, encapsulation and in some cases performance

Sorry if this is too abrupt or not a welcome set of changes, but thought I'd see if I could contribute a little. I'm a Java developer and just getting seriously into Spark. So I thought I'd suggest a number of small changes to the couple Java parts of the code to make it a little tighter, more standard and even a bit faster.

Feel free to take all, some or none of this. Happy to explain any of it.
2014-01-07 08:10:02 -08:00
Prashant Sharma c729fa7c8e formatting related fixes suggested by Patrick. 2014-01-07 13:08:16 +05:30
Prashant Sharma b84dc780d3 Allow configuration to be printed in logs for diagnosis. 2014-01-07 13:01:43 +05:30
Prashant Sharma b3018811e1 Allow users to set arbitrary akka configurations via spark conf. 2014-01-07 13:01:43 +05:30
Patrick Wendell 6a3daead2d Fixes after merge 2014-01-06 20:12:45 -08:00
Patrick Wendell c0498f9265 Merge remote-tracking branch 'apache-github/master' into standalone-driver
Conflicts:
	core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala
	core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala
	core/src/main/scala/org/apache/spark/deploy/master/Master.scala
	core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
2014-01-06 17:29:21 -08:00
Patrick Wendell f236ddd1a2 Changes based on review feedback. 2014-01-06 17:15:52 -08:00
Patrick Wendell 357083c29f Merge pull request #330 from tgravescs/fix_addjars_null_handling
Fix handling of empty SPARK_EXAMPLES_JAR

Currently if SPARK_EXAMPLES_JAR is left unset you get a null pointer exception when running the examples (atleast on spark on yarn).  The null now gets turned into a string of "null" when its put into the SparkConf so addJar no longer properly ignores it. This fixes that so that it can be left unset.
2014-01-06 10:29:04 -08:00
walker 2ad315e80f add inline comments 2014-01-07 01:27:57 +08:00
walker 6ab1db8071 add inline comments 2014-01-07 01:21:25 +08:00
walker a0c6d96e27 Merge remote branch 'upstream/master' 2014-01-07 01:05:18 +08:00