spark-instrumented-optimizer/external
Tathagata Das 6600786ddd [SPARK-11361][STREAMING] Show scopes of RDD operations inside DStream.foreachRDD and DStream.transform in DAG viz
Currently, when a DStream sets the scope for RDD generated by it, that scope is not allowed to be overridden by the RDD operations. So in case of `DStream.foreachRDD`, all the RDDs generated inside the foreachRDD get the same scope - `foreachRDD  <time>`, as set by the `ForeachDStream`. So it is hard to debug generated RDDs in the RDD DAG viz in the Spark UI.

This patch allows the RDD operations inside `DStream.transform` and `DStream.foreachRDD` to append their own scopes to the earlier DStream scope.

I have also slightly tweaked how callsites are set such that the short callsite reflects the RDD operation name and line number. This tweak is necessary as callsites are not managed through scopes (which support nesting and overriding) and I didnt want to add another local property to control nesting and overriding of callsites.

## Before:
![image](https://cloud.githubusercontent.com/assets/663212/10808548/fa71c0c4-7da9-11e5-9af0-5737793a146f.png)

## After:
![image](https://cloud.githubusercontent.com/assets/663212/10808659/37bc45b6-7dab-11e5-8041-c20be6a9bc26.png)

The code that was used to generate this is:
```
    val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)
    val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
    wordCounts.foreachRDD { rdd =>
      val temp = rdd.map { _ -> 1 }.reduceByKey( _ + _)
      val temp2 = temp.map { _ -> 1}.reduceByKey(_ + _)
      val count = temp2.count
      println(count)
    }
```

Note
- The inner scopes of the RDD operations map/reduceByKey inside foreachRDD is visible
- The short callsites of stages refers to the line number of the RDD ops rather than the same line number of foreachRDD in all three cases.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #9315 from tdas/SPARK-11361.
2015-11-10 16:54:06 -08:00
..
flume [SPARK-11361][STREAMING] Show scopes of RDD operations inside DStream.foreachRDD and DStream.transform in DAG viz 2015-11-10 16:54:06 -08:00
flume-assembly Update version to 1.6.0-SNAPSHOT. 2015-09-15 00:54:20 -07:00
flume-sink [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. 2015-10-07 14:11:21 -07:00
kafka [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. 2015-10-07 14:11:21 -07:00
kafka-assembly Update version to 1.6.0-SNAPSHOT. 2015-09-15 00:54:20 -07:00
mqtt [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. 2015-10-07 14:11:21 -07:00
mqtt-assembly Update version to 1.6.0-SNAPSHOT. 2015-09-15 00:54:20 -07:00
twitter [SPARK-11245] update twitter4j to 4.0.4 version 2015-10-24 18:16:45 +01:00
zeromq [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. 2015-10-07 14:11:21 -07:00