Commit graph

93 commits

Author SHA1 Message Date
Mridul Muralidharan 02dffd2eb0 Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained 2013-04-17 05:52:57 +05:30
Mridul Muralidharan 5540ab8243 Use hostname instead of hostport for executor, fix creation of workdir 2013-04-16 02:57:43 +05:30
Mridul Muralidharan d90d2af103 Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues 2013-04-15 18:12:11 +05:30
Christoph Grothaus 445f387ef4 Bugfix: WorkerWebUI must respect workDirPath from Worker 2013-03-22 11:08:40 +01:00
Stephen Haberman fb34967815 Remove try/catch block that can't be hit. 2013-03-18 01:55:50 -05:00
Charles Reiss b0983c5762 Notify standalone deploy client of application death.
Usually, this isn't necessary since the application will be removed
as a result of the deploy client disconnecting, but occassionally, the
standalone deploy master removes an application otherwise.

Also mark applications as FAILED instead of FINISHED when they are
killed as a result of their executors failing too many times.
2013-03-09 11:29:45 -08:00
Mosharaf Chowdhury 4ab387bcdb Fixed master datastructure updates after removing an application; and a typo. 2013-02-27 13:52:44 -08:00
Matei Zaharia 568bdaf8ae Set spark.deploy.spreadOut to true by default in 0.7 (improves locality) 2013-02-25 14:34:55 -08:00
Matei Zaharia d4d7993bf5 Several fixes to the work to log when no resources can be used by a job.
Fixed some of the messages as well as code style.
2013-02-22 15:51:37 -08:00
Matei Zaharia f33662c133 Merge remote-tracking branch 'pwendell/starvation-check'
Also fixed a bug where master was offering executors on dead workers

Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
2013-02-22 15:27:41 -08:00
Matei Zaharia 7151e1e4c8 Rename "jobs" to "applications" in the standalone cluster 2013-02-17 23:23:08 -08:00
Imran Rashid 8f18e7e863 include jobid in Executor commandline args 2013-02-13 13:05:13 -08:00
Matei Zaharia da8afbc77e Some bug and formatting fixes to FT
Conflicts:
	core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
2013-02-10 22:43:38 -08:00
root 1b47fa2752 Detect hard crashes of workers using a heartbeat mechanism.
Also fixes some issues in the rest of the code with detecting workers this way.

Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
	core/src/main/scala/spark/deploy/worker/Worker.scala
	core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
2013-02-10 22:28:28 -08:00
Matei Zaharia 8c66c49962 Tweak web UI so that people don't get confused about master URL format
Conflicts:
	core/src/main/twirl/spark/deploy/master/index.scala.html
	core/src/main/twirl/spark/deploy/worker/index.scala.html
2013-02-10 21:58:34 -08:00
Stephen Haberman 870b2aaf5d Merge branch 'master' into fixdeathpactexception
Conflicts:
	core/src/main/scala/spark/deploy/worker/Worker.scala
2013-02-05 20:27:09 -06:00
Stephen Haberman 0e19093fd8 Handle Terminated to avoid endless DeathPactExceptions.
Credit to Roland Kuhn, Akka's tech lead, for pointing out this
various obvious fix, but StandaloneExecutorBackend.preStart's
catch block would never (ever) get hit, because all of the
operation's in preStart are async.

So, the System.exit in the catch block was skipped, and instead
Akka was sending Terminated messages which, since we didn't
handle, it turned into DeathPactException, which started
a postRestart/preStart infinite loop.
2013-02-05 18:58:00 -06:00
Patrick Wendell b14322956c Starvation check in Standlone scheduler 2013-02-03 12:45:10 -08:00
Matei Zaharia 3bfaf3ab1d Merge pull request #379 from stephenh/sparkmem
Add spark.executor.memory to differentiate executor memory from spark-shell
2013-02-02 23:58:23 -08:00
Stephen Haberman 103c375ba0 Merge branch 'master' into sparkmem 2013-02-02 01:57:18 -06:00
Stephen Haberman 28e0cb9f31 Fix createActorSystem not actually using the systemName parameter.
This meant all system names were "spark", which worked, but didn't
lead to the most intuitive log output.

This fixes createActorSystem to use the passed system name, and
refactors Master/Worker to encapsulate their system/actor names
instead of having the clients guess at them.

Note that the driver system name, "spark", is left as is, and is
still repeated a few times, but that seems like a separate issue.
2013-02-02 01:11:37 -06:00
Stephen Haberman 418e36caa8 Add more private declarations. 2013-01-31 17:18:33 -06:00
Stephen Haberman 871476d506 Include message and exitStatus if availalbe. 2013-01-30 16:56:46 -06:00
Matei Zaharia d54b10b6ad Merge remote-tracking branch 'stephenh/removefailedjob'
Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
2013-01-29 18:12:29 -08:00
Stephen Haberman 13368818af Merge branch 'master' into driver
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/SparkEnv.scala
	core/src/main/scala/spark/deploy/LocalSparkCluster.scala
	core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala
	core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/ThreadingTest.scala
	core/src/test/scala/spark/MapOutputTrackerSuite.scala
2013-01-28 23:30:24 -06:00
Matei Zaharia f03d9760fd Clean up BlockManagerUI a little (make it not be an object, merge with
Directives, and bind to a random port)
2013-01-27 23:56:14 -08:00
Matei Zaharia 909850729e Rename more things from slave to executor 2013-01-27 23:17:20 -08:00
Matei Zaharia 44b4a0f88f Track workers by executor ID instead of hostname to allow multiple
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Stephen Haberman 7dfb82a992 Replace old 'master' term with 'driver'. 2013-01-25 11:03:00 -06:00
Stephen Haberman 98d0b7747d Fix Worker logInfo about unknown executor. 2013-01-22 18:11:51 -06:00
Stephen Haberman 8c51322cd0 Don't bother creating an exception. 2013-01-22 18:09:10 -06:00
Stephen Haberman fdec42385a Fix SPARK_MEM in ExecutorRunner. 2013-01-22 18:01:12 -06:00
Stephen Haberman 250fe89679 Handle Master telling the Worker to kill an already-dead executor. 2013-01-22 16:29:05 -06:00
Stephen Haberman 6f2194f757 Call removeJob instead of killing the cluster. 2013-01-22 15:38:58 -06:00
Josh Rosen ef711902c1 Don't download files to master's working directory.
This should avoid exceptions caused by existing
files with different contents.

I also removed some unused code.
2013-01-21 17:34:17 -08:00
Imran Rashid a3f571b539 more File -> String changes 2013-01-21 11:21:52 -08:00
Imran Rashid fe26acc482 remove unused imports 2013-01-21 11:21:46 -08:00
Imran Rashid c73107500e send sparkHome as String instead of File over network 2013-01-21 11:21:39 -08:00
Imran Rashid f116d6b5c6 executor can use a different sparkHome from Worker 2013-01-21 11:21:22 -08:00
Stephen Haberman 74d3b23929 Add spark.executor.memory to differentiate executor memory from spark-shell memory. 2013-01-15 14:03:28 -06:00
Tyson 1731f1fed4 Added an optional format parameter for individual job queries and optimized the jobId query 2013-01-11 15:01:43 -05:00
Tyson c063e8777e Added implicit json writers for JobDescription and ExecutorRunner 2013-01-11 14:57:38 -05:00
Matei Zaharia 2e914d9983 Formatting 2013-01-10 19:13:08 -08:00
Matei Zaharia 6d1c230281 Merge pull request #357 from tysonjh/master
JSON support added to WebUI
2013-01-10 19:06:07 -08:00
Tyson 549ee388a1 Removed io.spray spray-json dependency as it is not needed. 2013-01-09 15:12:23 -05:00
Tyson bf9d9946f9 Query parameter reformatted to be more extensible and routing more robust 2013-01-09 11:29:58 -05:00
Tyson 0da2ff102e Added url query parameter json and handler 2013-01-09 10:40:48 -05:00
Tyson 269fe018c7 JSON object definitions 2013-01-09 10:40:43 -05:00
Shivaram Venkataraman f8d579a0c0 Remove dependencies on sun jvm classes. Instead use reflection to infer
HotSpot options and total physical memory size
2013-01-07 15:57:18 -08:00
Patrick Wendell bfac06e1f6 SPARK-616: Logging dead workers in Web UI.
This patch keeps track of which workers have died and marks them
as such in the master web UI. It also handles workers which die and
re-register using different actor ID's.
2012-12-17 23:09:05 -08:00