Commit graph

7 commits

Author SHA1 Message Date
Aaron Davidson 0f070279e7 Address Matei's comments 2013-10-05 15:15:29 -07:00
Aaron Davidson d5a96feccb Standalone Scheduler fault recovery
Implements a basic form of Standalone Scheduler fault recovery. In particular,
this allows faults to be manually recovered from by means of restarting the
Master process on the same machine. This is the majority of the code necessary
for general fault tolerance, which will first elect a leader and then recover
the Master state.

In order to enable fault recovery, the Master will persist a small amount of state related
to the registration of Workers and Applications to disk. If the Master is started and
sees that this state is still around, it will enter Recovery mode, during which time it
will not schedule any new Executors on Workers (but it does accept the registration of
new Clients and Workers).

At this point, the Master attempts to reconnect to all Workers and Client applications
that were registered at the time of failure. After confirming either the existence
or nonexistence of all such nodes (within a certain timeout), the Master will exit
Recovery mode and resume normal scheduling.
2013-09-26 14:59:35 -07:00
Matei Zaharia 46eecd110a Initial work to rename package to org.apache.spark 2013-09-01 14:13:13 -07:00
Matei Zaharia af3c9d5042 Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
kalpit f08db010d3 added SPARK_WORKER_INSTANCES : allows spawning multiple worker instances/processes on every slave machine 2013-03-26 17:49:30 -07:00
Denny 53008c2d8a Settings variables and bugfix for stop script. 2012-08-02 15:59:39 -07:00
Denny 0ee44c225e Spark standalone mode cluster scripts.
Heavily inspired by Hadoop cluster scripts ;-)
2012-08-01 20:38:52 -07:00