Mridul Muralidharan
02dffd2eb0
Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained
2013-04-17 05:52:57 +05:30
Mridul Muralidharan
5540ab8243
Use hostname instead of hostport for executor, fix creation of workdir
2013-04-16 02:57:43 +05:30
Mridul Muralidharan
d90d2af103
Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues
2013-04-15 18:12:11 +05:30
Christoph Grothaus
445f387ef4
Bugfix: WorkerWebUI must respect workDirPath from Worker
2013-03-22 11:08:40 +01:00
Stephen Haberman
fb34967815
Remove try/catch block that can't be hit.
2013-03-18 01:55:50 -05:00
Charles Reiss
b0983c5762
Notify standalone deploy client of application death.
...
Usually, this isn't necessary since the application will be removed
as a result of the deploy client disconnecting, but occassionally, the
standalone deploy master removes an application otherwise.
Also mark applications as FAILED instead of FINISHED when they are
killed as a result of their executors failing too many times.
2013-03-09 11:29:45 -08:00
Mosharaf Chowdhury
4ab387bcdb
Fixed master datastructure updates after removing an application; and a typo.
2013-02-27 13:52:44 -08:00
Matei Zaharia
568bdaf8ae
Set spark.deploy.spreadOut to true by default in 0.7 (improves locality)
2013-02-25 14:34:55 -08:00
Matei Zaharia
d4d7993bf5
Several fixes to the work to log when no resources can be used by a job.
...
Fixed some of the messages as well as code style.
2013-02-22 15:51:37 -08:00
Matei Zaharia
f33662c133
Merge remote-tracking branch 'pwendell/starvation-check'
...
Also fixed a bug where master was offering executors on dead workers
Conflicts:
core/src/main/scala/spark/deploy/master/Master.scala
2013-02-22 15:27:41 -08:00
Matei Zaharia
7151e1e4c8
Rename "jobs" to "applications" in the standalone cluster
2013-02-17 23:23:08 -08:00
Imran Rashid
8f18e7e863
include jobid in Executor commandline args
2013-02-13 13:05:13 -08:00
Matei Zaharia
da8afbc77e
Some bug and formatting fixes to FT
...
Conflicts:
core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
2013-02-10 22:43:38 -08:00
root
1b47fa2752
Detect hard crashes of workers using a heartbeat mechanism.
...
Also fixes some issues in the rest of the code with detecting workers this way.
Conflicts:
core/src/main/scala/spark/deploy/master/Master.scala
core/src/main/scala/spark/deploy/worker/Worker.scala
core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
2013-02-10 22:28:28 -08:00
Matei Zaharia
8c66c49962
Tweak web UI so that people don't get confused about master URL format
...
Conflicts:
core/src/main/twirl/spark/deploy/master/index.scala.html
core/src/main/twirl/spark/deploy/worker/index.scala.html
2013-02-10 21:58:34 -08:00
Stephen Haberman
870b2aaf5d
Merge branch 'master' into fixdeathpactexception
...
Conflicts:
core/src/main/scala/spark/deploy/worker/Worker.scala
2013-02-05 20:27:09 -06:00
Stephen Haberman
0e19093fd8
Handle Terminated to avoid endless DeathPactExceptions.
...
Credit to Roland Kuhn, Akka's tech lead, for pointing out this
various obvious fix, but StandaloneExecutorBackend.preStart's
catch block would never (ever) get hit, because all of the
operation's in preStart are async.
So, the System.exit in the catch block was skipped, and instead
Akka was sending Terminated messages which, since we didn't
handle, it turned into DeathPactException, which started
a postRestart/preStart infinite loop.
2013-02-05 18:58:00 -06:00
Patrick Wendell
b14322956c
Starvation check in Standlone scheduler
2013-02-03 12:45:10 -08:00
Matei Zaharia
3bfaf3ab1d
Merge pull request #379 from stephenh/sparkmem
...
Add spark.executor.memory to differentiate executor memory from spark-shell
2013-02-02 23:58:23 -08:00
Stephen Haberman
103c375ba0
Merge branch 'master' into sparkmem
2013-02-02 01:57:18 -06:00
Stephen Haberman
28e0cb9f31
Fix createActorSystem not actually using the systemName parameter.
...
This meant all system names were "spark", which worked, but didn't
lead to the most intuitive log output.
This fixes createActorSystem to use the passed system name, and
refactors Master/Worker to encapsulate their system/actor names
instead of having the clients guess at them.
Note that the driver system name, "spark", is left as is, and is
still repeated a few times, but that seems like a separate issue.
2013-02-02 01:11:37 -06:00
Stephen Haberman
418e36caa8
Add more private declarations.
2013-01-31 17:18:33 -06:00
Stephen Haberman
871476d506
Include message and exitStatus if availalbe.
2013-01-30 16:56:46 -06:00
Matei Zaharia
d54b10b6ad
Merge remote-tracking branch 'stephenh/removefailedjob'
...
Conflicts:
core/src/main/scala/spark/deploy/master/Master.scala
2013-01-29 18:12:29 -08:00
Stephen Haberman
13368818af
Merge branch 'master' into driver
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/SparkEnv.scala
core/src/main/scala/spark/deploy/LocalSparkCluster.scala
core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala
core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/ThreadingTest.scala
core/src/test/scala/spark/MapOutputTrackerSuite.scala
2013-01-28 23:30:24 -06:00
Matei Zaharia
f03d9760fd
Clean up BlockManagerUI a little (make it not be an object, merge with
...
Directives, and bind to a random port)
2013-01-27 23:56:14 -08:00
Matei Zaharia
909850729e
Rename more things from slave to executor
2013-01-27 23:17:20 -08:00
Matei Zaharia
44b4a0f88f
Track workers by executor ID instead of hostname to allow multiple
...
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Stephen Haberman
7dfb82a992
Replace old 'master' term with 'driver'.
2013-01-25 11:03:00 -06:00
Stephen Haberman
98d0b7747d
Fix Worker logInfo about unknown executor.
2013-01-22 18:11:51 -06:00
Stephen Haberman
8c51322cd0
Don't bother creating an exception.
2013-01-22 18:09:10 -06:00
Stephen Haberman
fdec42385a
Fix SPARK_MEM in ExecutorRunner.
2013-01-22 18:01:12 -06:00
Stephen Haberman
250fe89679
Handle Master telling the Worker to kill an already-dead executor.
2013-01-22 16:29:05 -06:00
Stephen Haberman
6f2194f757
Call removeJob instead of killing the cluster.
2013-01-22 15:38:58 -06:00
Josh Rosen
ef711902c1
Don't download files to master's working directory.
...
This should avoid exceptions caused by existing
files with different contents.
I also removed some unused code.
2013-01-21 17:34:17 -08:00
Imran Rashid
a3f571b539
more File -> String changes
2013-01-21 11:21:52 -08:00
Imran Rashid
fe26acc482
remove unused imports
2013-01-21 11:21:46 -08:00
Imran Rashid
c73107500e
send sparkHome as String instead of File over network
2013-01-21 11:21:39 -08:00
Imran Rashid
f116d6b5c6
executor can use a different sparkHome from Worker
2013-01-21 11:21:22 -08:00
Stephen Haberman
74d3b23929
Add spark.executor.memory to differentiate executor memory from spark-shell memory.
2013-01-15 14:03:28 -06:00
Tyson
1731f1fed4
Added an optional format parameter for individual job queries and optimized the jobId query
2013-01-11 15:01:43 -05:00
Tyson
c063e8777e
Added implicit json writers for JobDescription and ExecutorRunner
2013-01-11 14:57:38 -05:00
Matei Zaharia
2e914d9983
Formatting
2013-01-10 19:13:08 -08:00
Matei Zaharia
6d1c230281
Merge pull request #357 from tysonjh/master
...
JSON support added to WebUI
2013-01-10 19:06:07 -08:00
Tyson
549ee388a1
Removed io.spray spray-json dependency as it is not needed.
2013-01-09 15:12:23 -05:00
Tyson
bf9d9946f9
Query parameter reformatted to be more extensible and routing more robust
2013-01-09 11:29:58 -05:00
Tyson
0da2ff102e
Added url query parameter json and handler
2013-01-09 10:40:48 -05:00
Tyson
269fe018c7
JSON object definitions
2013-01-09 10:40:43 -05:00
Shivaram Venkataraman
f8d579a0c0
Remove dependencies on sun jvm classes. Instead use reflection to infer
...
HotSpot options and total physical memory size
2013-01-07 15:57:18 -08:00
Patrick Wendell
bfac06e1f6
SPARK-616: Logging dead workers in Web UI.
...
This patch keeps track of which workers have died and marks them
as such in the master web UI. It also handles workers which die and
re-register using different actor ID's.
2012-12-17 23:09:05 -08:00