Migration to Scala 2.10
== Below description was written by Prashant Sharma ==
This PR migrates spark to scala 2.10.
Summary of changes apart from scala 2.10 migration:
(has no implications for user.)
1. Migrated Akka to 2.2.3.
Does not use remote death watch for it has a bug, where it tries to send message to dead node infinitely.
Uses an indestructible actorsystem which tolerates errors only on executors.
(Might be useful for user.)
4. New configuration settings introduced:
System.getProperty("spark.akka.heartbeat.pauses", "600")
System.getProperty("spark.akka.failure-detector.threshold", "300.0")
System.getProperty("spark.akka.heartbeat.interval", "1000")
Defaults for these are fairly large to only disable Failure detector that comes with akka. The reason for doing so is we have our own failure detector like mechanism in place and then this is just an overhead on top of that + it leads to a lot of false positives. But with these properties it is possible to enable them. A good use case for enabling it could be when someone wants spark to be sensitive (in a controllable manner ofc.) to GC pauses/Network lags and quickly evict executors that experienced it. More information is included in configuration.md
Once we have the SPARK-544 merged, I had like to deprecate atleast these akka properties and may be others too.
This PR is duplicate of #221(where all the discussion happened.) for that one pointed to master this one points to scala-2.10 branch.
- Refactored Scheduler + JobManager to JobGenerator + JobScheduler and
added JobSet for cleaner code. Moved scheduler related code to
streaming.scheduler package.
- Added StreamingListener trait (similar to SparkListener) to enable
gathering to streaming stats like processing times and delays.
StreamingContext.addListener() to added listeners.
- Deduped some code in streaming tests by modifying TestSuiteBase, and
added StreamingListenerSuite.
Scala 2.10 migration
This PR migrates spark to scala 2.10.
Summary of changes apart from scala 2.10 migration:
(has no implications for user.)
1. Migrated Akka to 2.2.3.
Does not use remote death watch for it has a bug, where it tries to send message to dead node infinitely.
Uses an indestructible actorsystem which tolerates errors only on executors.
(Might be useful for user.)
4. New configuration settings introduced:
System.getProperty("spark.akka.heartbeat.pauses", "600")
System.getProperty("spark.akka.failure-detector.threshold", "300.0")
System.getProperty("spark.akka.heartbeat.interval", "1000")
Defaults for these are fairly large to only disable Failure detector that comes with akka. The reason for doing so is we have our own failure detector like mechanism in place and then this is just an overhead on top of that + it leads to a lot of false positives. But with these properties it is possible to enable them. A good use case for enabling it could be when someone wants spark to be sensitive (in a controllable manner ofc.) to GC pauses/Network lags and quickly evict executors that experienced it. More information is included in configuration.md
Once we have the SPARK-544 merged, I had like to deprecate atleast these akka properties and may be others too.
This PR is duplicate of #221(where all the discussion happened.) for that one pointed to master this one points to scala-2.10 branch.
README incorrectly suggests build sources spark-env.sh
This is misleading because the build doesn't source that file. IMO
it's better to force people to specify build environment variables
on the command line always, like we do in every example, so I'm
just removing this doc.
This is misleading because the build doesn't source that file. IMO
it's better to force people to specify build environment variables
on the command line always, like we do in every example.