Change the order of CLASSPATH.
SPARK_TOOLS_JAR should be placed after CLASSPATH or at least after
SPARK_CLASSPATH.
If SPARK_TOOLS_JAR is placed before CLASSPATH, all assembled classes and
resources in spark-tools-assembly.jar beat those in CLASSPATH or
SPARK_CLASSPATH, which might be replaced by customized versions.
Refactored the streaming scheduler and added StreamingListener interface
- Refactored the streaming scheduler for cleaner code. Specifically, the JobManager was renamed to JobScheduler, as it does the actual scheduling of Spark jobs to the SparkContext. The earlier Scheduler was renamed to JobGenerator, as it actually generates the jobs from the DStreams. The JobScheduler starts the JobGenerator. Also, moved all the scheduler related code from spark.streaming to spark.streaming.scheduler package.
- Implemented the StreamingListener interface, similar to SparkListener. The streaming version of StatusReportListener prints the batch processing time statistics (for now). Added StreamingListernerSuite to test it.
- Refactored streaming TestSuiteBase for deduping code in the other streaming testsuites.
Track and report task result serialisation time.
- DirectTaskResult now has a ByteBuffer valueBytes instead of a T value.
- DirectTaskResult now has a member function T value() that deserialises valueBytes.
- Executor serialises value into a ByteBuffer and passes it to DTR's ctor.
- Executor tracks the time taken to do so and puts it in a new field in TaskMetrics.
- StagePage now reports serialisation time from TaskMetrics along with the other things it reported.
Previously, we would only clean the in-memory metadata for consolidated
shuffle files.
Additionally, fixes a bug where the Metadata Cleaner was ignoring type-
specific TTLs.
Add collectPartition to JavaRDD interface.
This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py.
Thanks @concretevitamin for the original change and tests.
Change the implementation to use runJob instead of PartitionPruningRDD.
Also update the unit tests and the python take implementation
to use the new interface.
[SPARK-959] Explicitly depend on org.eclipse.jetty.orbit jar
Without this, in some cases, Ivy attempts to download the wrong file and fails, stopping the whole build. See [bug](https://spark-project.atlassian.net/browse/SPARK-959) for more details.
Note that this may not be the best solution, as I do not understand the root cause of why this only happens for some people. However, it is reported to work.
Without this, in some cases, Ivy attempts to download the wrong file
and fails, stopping the whole build. See bug for more details.
(This is probably also the beginning of the slow death of our
recently prettified dependencies. Form follow function.)