[SPARK-4601][Streaming] Set correct call site for streaming jobs so that it is displayed correctly on the Spark UI

When running the NetworkWordCount, the description of the word count jobs are set as "getCallsite at DStream:xxx" . This should be set to the line number of the streaming application that has the output operation that led to the job being created. This is because the callsite is incorrectly set in the thread launching the jobs. This PR fixes that.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #3455 from tdas/streaming-callsite-fix and squashes the following commits:

69fc26f [Tathagata Das] Set correct call site for streaming jobs so that it is displayed correctly on the Spark UI
This commit is contained in:
Tathagata Das 2014-11-25 06:50:36 -08:00
parent d240760191
commit 69cd53eae2
2 changed files with 6 additions and 1 deletions

View file

@ -38,6 +38,7 @@ class ForEachDStream[T: ClassTag] (
parent.getOrCompute(time) match {
case Some(rdd) =>
val jobFunc = () => {
ssc.sparkContext.setCallSite(creationSite)
foreachFunc(rdd, time)
}
Some(new Job(time, jobFunc))

View file

@ -336,16 +336,20 @@ package object testPackage extends Assertions {
// Verify creation site of generated RDDs
var rddGenerated = false
var rddCreationSiteCorrect = true
var rddCreationSiteCorrect = false
var foreachCallSiteCorrect = false
inputStream.foreachRDD { rdd =>
rddCreationSiteCorrect = rdd.creationSite == creationSite
foreachCallSiteCorrect =
rdd.sparkContext.getCallSite().shortForm.contains("StreamingContextSuite")
rddGenerated = true
}
ssc.start()
eventually(timeout(10000 millis), interval(10 millis)) {
assert(rddGenerated && rddCreationSiteCorrect, "RDD creation site was not correct")
assert(rddGenerated && foreachCallSiteCorrect, "Call site in foreachRDD was not correct")
}
} finally {
ssc.stop()