recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
ensures that a new checkpoint is written as soon as at least one batch
gets computed while recovering from a failure. This ensures that if
there is a 2nd failure while recovering from 1st failure, the system
start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
results with expected output in an order-independent manner.
Credit to Roland Kuhn, Akka's tech lead, for pointing out this
various obvious fix, but StandaloneExecutorBackend.preStart's
catch block would never (ever) get hit, because all of the
operation's in preStart are async.
So, the System.exit in the catch block was skipped, and instead
Akka was sending Terminated messages which, since we didn't
handle, it turned into DeathPactException, which started
a postRestart/preStart infinite loop.
No functionality changes, I think this is just more consistent
given mergePair isn't called multiple times/recursive.
Also added a comment to explain the usual case of having two parent RDDs.
This meant all system names were "spark", which worked, but didn't
lead to the most intuitive log output.
This fixes createActorSystem to use the passed system name, and
refactors Master/Worker to encapsulate their system/actor names
instead of having the clients guess at them.
Note that the driver system name, "spark", is left as is, and is
still repeated a few times, but that seems like a separate issue.
Given we have Akka logging go through SLF4j to log4j, we don't need
all the extra noise of Akka's stdout logger that is supposedly only
used during Akka init time but seems to continue logging lots of
noisy network events that we either don't care about or are in the
log4j logs anyway.
See:
http://doc.akka.io/docs/akka/2.0/general/configuration.html
# Log level for the very basic logger activated during AkkaApplication startup
# Options: ERROR, WARNING, INFO, DEBUG
# stdout-loglevel = "WARNING"
These operations used to wait for all the results to be available in an
array on the driver program before merging them. They now merge values
incrementally as they arrive.