Commit graph

780 commits

Author SHA1 Message Date
Tathagata Das a17a5cb302 [SPARK-7776] [STREAMING] Added shutdown hook to StreamingContext
Shutdown hook to stop SparkContext was added recently. This results in ugly errors when a streaming application is terminated by ctrl-C.

```
Exception in thread "Thread-27" org.apache.spark.SparkException: Job cancelled because SparkContext was shut down
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:736)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:735)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
	at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:735)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1468)
	at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84)
	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1403)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1642)
	at org.apache.spark.SparkContext$$anonfun$3.apply$mcV$sp(SparkContext.scala:559)
	at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2266)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2236)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2236)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2236)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1764)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2236)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2236)
	at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2236)
	at scala.util.Try$.apply(Try.scala:161)
	at org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2236)
	at org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2218)
	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
```

This is because the Spark's shutdown hook stops the context, and the streaming jobs fail in the middle. The correct solution is to stop the streaming context before the spark context. This PR adds the shutdown hook to do so with a priority higher than the SparkContext's shutdown hooks priority.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6307 from tdas/SPARK-7776 and squashes the following commits:

e3d5475 [Tathagata Das] Added conf to specify graceful shutdown
4c18652 [Tathagata Das] Added shutdown hook to StreamingContxt.

(cherry picked from commit d68ea24d60)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-21 17:45:17 -07:00
Burak Yavuz f08c6f3193 [SPARK-7745] Change asserts to requires for user input checks in Spark Streaming
Assertions can be turned off. `require` throws an `IllegalArgumentException` which makes more sense when it's a user set variable.

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #6271 from brkyvz/streaming-require and squashes the following commits:

d249484 [Burak Yavuz] fix merge conflict
264adb8 [Burak Yavuz] addressed comments v1.0
6161350 [Burak Yavuz] fix tests
16aa766 [Burak Yavuz] changed more assertions to more meaningful errors
afd923d [Burak Yavuz] changed some assertions to require

(cherry picked from commit 1ee8eb431e)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-21 00:31:16 -07:00
zsxwing b6182ce891 [SPARK-7777] [STREAMING] Fix the flaky test in org.apache.spark.streaming.BasicOperationsSuite
Just added a guard to make sure a batch has completed before moving to the next batch.

Author: zsxwing <zsxwing@gmail.com>

Closes #6306 from zsxwing/SPARK-7777 and squashes the following commits:

ecee529 [zsxwing] Fix the failure message
58634fe [zsxwing] Fix the flaky test in org.apache.spark.streaming.BasicOperationsSuite

(cherry picked from commit 895baf8f77)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-20 19:56:10 -07:00
Patrick Wendell 9b37e32c55 Preparing development version 1.4.0-SNAPSHOT 2015-05-20 17:29:00 -07:00
Patrick Wendell 1e458e3553 Preparing Spark release rc-test 2015-05-20 17:28:55 -07:00
pwendell 8d66849862 Preparing development version 1.4.0-SNAPSHOT 2015-05-20 17:26:15 -07:00
pwendell ae29aeaf8e Preparing Spark release rc-test 2015-05-20 17:26:10 -07:00
jenkins 534c787b9f Preparing development version 1.4.0-SNAPSHOT 2015-05-20 16:49:59 -07:00
jenkins 5f4d87f608 Preparing Spark release rc-test 2015-05-20 16:49:54 -07:00
Patrick Wendell 205ed15f29 Preparing development version 1.4.0-SNAPSHOT 2015-05-20 16:30:01 -07:00
Patrick Wendell 09a1c6231e Preparing Spark release rc-test 2015-05-20 16:29:52 -07:00
Tathagata Das a502e4b845 [SPARK-7767] [STREAMING] Added test for checkpoint serialization in StreamingContext.start()
Currently, the background checkpointing thread fails silently if the checkpoint is not serializable. It is hard to debug and therefore its best to fail fast at `start()` when checkpointing is enabled and the checkpoint is not serializable.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6292 from tdas/SPARK-7767 and squashes the following commits:

51304e6 [Tathagata Das] Addressed comments.
c35237b [Tathagata Das] Added test for checkpoint serialization in StreamingContext.start()

(cherry picked from commit 3c434cbfd0)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-20 16:21:31 -07:00
Andrew Or 23356dd0d9 [SPARK-7237] [SPARK-7741] [CORE] [STREAMING] Clean more closures that need cleaning
SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to #6268.

Author: Andrew Or <andrew@databricks.com>

Closes #6269 from andrewor14/clean-moar and squashes the following commits:

c51c9ab [Andrew Or] Add periods (trivial)
6c686ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
79a435b [Andrew Or] Fix tests
d18c9f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
65ef07b [Andrew Or] Fix tests?
4b487a3 [Andrew Or] Add tests for closures passed to DStream operations
328139b [Andrew Or] Do not forget foreachRDD
5431f61 [Andrew Or] Clean streaming closures
72b7b73 [Andrew Or] Clean core closures

(cherry picked from commit 9b84443dd4)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-20 15:39:47 -07:00
Patrick Wendell ac3197e1b9 Preparing development version 1.4.1-SNAPSHOT 2015-05-19 09:35:12 +00:00
Patrick Wendell 777a08166f Preparing Spark release v1.4.0-rc1 2015-05-19 09:35:12 +00:00
Patrick Wendell 586ede6b32 Revert "Preparing Spark release v1.4.0-rc1"
This reverts commit 79fb01a3be.
2015-05-19 02:27:14 -07:00
Patrick Wendell e7309ec729 Revert "Preparing development version 1.4.1-SNAPSHOT"
This reverts commit a1d896b85b.
2015-05-19 02:27:07 -07:00
Patrick Wendell a1d896b85b Preparing development version 1.4.1-SNAPSHOT 2015-05-19 07:13:24 +00:00
Patrick Wendell 79fb01a3be Preparing Spark release v1.4.0-rc1 2015-05-19 07:13:24 +00:00
Patrick Wendell b0c63d2413 Revert "Preparing Spark release v1.4.0-rc1"
This reverts commit 38ccef36c1.
2015-05-19 00:10:39 -07:00
Patrick Wendell 198a186ad3 Revert "Preparing development version 1.4.1-SNAPSHOT"
This reverts commit 40190ce226.
2015-05-19 00:10:37 -07:00
Patrick Wendell 40190ce226 Preparing development version 1.4.1-SNAPSHOT 2015-05-19 06:06:41 +00:00
Patrick Wendell 38ccef36c1 Preparing Spark release v1.4.0-rc1 2015-05-19 06:06:40 +00:00
Patrick Wendell 152b0291c0 Revert "Preparing Spark release v1.4.0-rc1"
This reverts commit e8e97e3a63.
2015-05-18 23:06:15 -07:00
Patrick Wendell 4d098bc049 Revert "Preparing development version 1.4.1-SNAPSHOT"
This reverts commit 758ca74bab.
2015-05-18 23:06:13 -07:00
Patrick Wendell 758ca74bab Preparing development version 1.4.1-SNAPSHOT 2015-05-19 05:01:11 +00:00
Patrick Wendell e8e97e3a63 Preparing Spark release v1.4.0-rc1 2015-05-19 05:01:11 +00:00
Andrew Or a475cbc978 [SPARK-7501] [STREAMING] DAG visualization: show DStream operations
This is similar to #5999, but for streaming. Roughly 200 lines are tests.

One thing to note here is that we already do some kind of scoping thing for call sites, so this patch adds the new RDD operation scoping logic in the same place. Also, this patch adds a `try finally` block to set the relevant variables in a safer way.

tdas zsxwing

------------------------
**Before**
<img src="https://cloud.githubusercontent.com/assets/2133137/7625996/d88211b8-f9b4-11e4-90b9-e11baa52d6d7.png" width="450px"/>

--------------------------
**After**
<img src="https://cloud.githubusercontent.com/assets/2133137/7625997/e0878f8c-f9b4-11e4-8df3-7dd611b13c87.png" width="650px"/>

Author: Andrew Or <andrew@databricks.com>

Closes #6034 from andrewor14/dag-viz-streaming and squashes the following commits:

932a64a [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
e685df9 [Andrew Or] Rename createRDDWith
84d0656 [Andrew Or] Review feedback
697c086 [Andrew Or] Fix tests
53b9936 [Andrew Or] Set scopes for foreachRDD properly
1881802 [Andrew Or] Refactor DStream scope names again
af4ba8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
fd07d22 [Andrew Or] Make MQTT lower case
f6de871 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
0ca1801 [Andrew Or] Remove a few unnecessary withScopes on aliases
fa4e5fb [Andrew Or] Pass in input stream name rather than defining it from within
1af0b0e [Andrew Or] Fix style
074c00b [Andrew Or] Review comments
d25a324 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
e4a93ac [Andrew Or] Fix tests?
25416dc [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
9113183 [Andrew Or] Add tests for DStream scopes
b3806ab [Andrew Or] Fix test
bb80bbb [Andrew Or] Fix MIMA?
5c30360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
5703939 [Andrew Or] Rename operations that create InputDStreams
7c4513d [Andrew Or] Group RDDs by DStream operations and batches
bf0ab6e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
05c2676 [Andrew Or] Wrap many more methods in withScope
c121047 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
65ef3e9 [Andrew Or] Fix NPE
a0d3263 [Andrew Or] Scope streaming operations instead of RDD operations

(cherry picked from commit b93c97d79b)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-18 14:33:45 -07:00
zsxwing 39add3dd53 [SPARK-7658] [STREAMING] [WEBUI] Update the mouse behaviors for the timeline graphs
1. If the user click one point of a batch, scroll down to the corresponding batch row and highlight it. And recovery the batch row after 3 seconds if necessary.

2. Add "#batches" in the histogram graphs.

![screen shot 2015-05-14 at 7 36 19 pm](https://cloud.githubusercontent.com/assets/1000778/7646108/84f4a014-fa73-11e4-8c13-1903d267e60f.png)

![screen shot 2015-05-14 at 7 36 53 pm](https://cloud.githubusercontent.com/assets/1000778/7646109/8b11154a-fa73-11e4-820b-8ece9fa6ee3e.png)

![screen shot 2015-05-14 at 7 36 34 pm](https://cloud.githubusercontent.com/assets/1000778/7646111/93828272-fa73-11e4-89f8-580670144d3c.png)

Author: zsxwing <zsxwing@gmail.com>

Closes #6168 from zsxwing/SPARK-7658 and squashes the following commits:

c242b00 [zsxwing] Change 5 seconds to 3 seconds
31fd0aa [zsxwing] Remove the mouseover highlight feature
06c6f6f [zsxwing] Merge branch 'master' into SPARK-7658
2eaff06 [zsxwing] Merge branch 'master' into SPARK-7658
108d56c [zsxwing] Update the mouse behaviors for the timeline graphs

(cherry picked from commit 0b6f503d53)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-18 13:34:55 -07:00
zsxwing 2a42d2d8f2 [SPARK-7693][Core] Remove "import scala.concurrent.ExecutionContext.Implicits.global"
Learnt a lesson from SPARK-7655: Spark should avoid to use `scala.concurrent.ExecutionContext.Implicits.global` because the user may submit blocking actions to `scala.concurrent.ExecutionContext.Implicits.global` and exhaust all threads in it. This could crash Spark. So Spark should always use its own thread pools for safety.

This PR removes all usages of `scala.concurrent.ExecutionContext.Implicits.global` and uses proper thread pools to replace them.

Author: zsxwing <zsxwing@gmail.com>

Closes #6223 from zsxwing/SPARK-7693 and squashes the following commits:

a33ff06 [zsxwing] Decrease the max thread number from 1024 to 128
cf4b3fc [zsxwing] Remove "import scala.concurrent.ExecutionContext.Implicits.global"

(cherry picked from commit ff71d34e00)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-17 20:37:27 -07:00
zsxwing 0ba99f061a [SPARK-7650] [STREAMING] [WEBUI] Move streaming css and js files to the streaming project
cc tdas

Author: zsxwing <zsxwing@gmail.com>

Closes #6160 from zsxwing/SPARK-7650 and squashes the following commits:

fe6ae15 [zsxwing] Fix the import order
a4ffd99 [zsxwing] Merge branch 'master' into SPARK-7650
dc402b6 [zsxwing] Move streaming css and js files to the streaming project

(cherry picked from commit cf842d42a7)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-14 23:51:51 -07:00
zsxwing 79983f17d9 [SPARK-7645] [STREAMING] [WEBUI] Show milliseconds in the UI if the batch interval < 1 second
I also updated the summary of the Streaming page.

![screen shot 2015-05-14 at 11 52 59 am](https://cloud.githubusercontent.com/assets/1000778/7640103/13cdf68e-fa36-11e4-84ec-e2a3954f4319.png)
![screen shot 2015-05-14 at 12 39 33 pm](https://cloud.githubusercontent.com/assets/1000778/7640151/4cc066ac-fa36-11e4-8494-2821d6a6f17c.png)

Author: zsxwing <zsxwing@gmail.com>

Closes #6154 from zsxwing/SPARK-7645 and squashes the following commits:

5db6ca1 [zsxwing] Add UIUtils.formatBatchTime
e4802df [zsxwing] Show milliseconds in the UI if the batch interval < 1 second

(cherry picked from commit b208f998b5)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-14 16:58:46 -07:00
Tathagata Das aec83949ab [SPARK-6752] [STREAMING] [REVISED] Allow StreamingContext to be recreated from checkpoint and existing SparkContext
This is a revision of the earlier version (see #5773) that passed the active SparkContext explicitly through a new set of Java and Scala API. The drawbacks are.

* Hard to implement in python.
* New API introduced. This is even more confusing since we are introducing getActiveOrCreate in SPARK-7553

Furthermore, there is now a direct way get an existing active SparkContext or create a new on - SparkContext.getOrCreate(conf). Its better to use this to get the SparkContext rather than have a new API to explicitly pass the context.

So in this PR I have
* Removed the new versions of StreamingContext.getOrCreate() which took SparkContext
* Added the ability to pick up existing SparkContext when the StreamingContext tries to create a SparkContext.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6096 from tdas/SPARK-6752 and squashes the following commits:

53f4b2d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-6752
f024b77 [Tathagata Das] Removed extra API and used SparkContext.getOrCreate

(cherry picked from commit bce00dac40)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-13 17:33:23 -07:00
Andrew Or e499a1e61b [STREAMING] [MINOR] Keep streaming.UIUtils private
zsxwing

Author: Andrew Or <andrew@databricks.com>

Closes #6134 from andrewor14/private-streaming-uiutils and squashes the following commits:

225df94 [Andrew Or] Privatize class

(cherry picked from commit bb6dec3b16)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-13 16:31:34 -07:00
zsxwing 10007fbe0b [SPARK-7589] [STREAMING] [WEBUI] Make "Input Rate" in the Streaming page consistent with other pages
This PR makes "Input Rate" in the Streaming page consistent with Job and Stage pages.

![screen shot 2015-05-12 at 5 03 35 pm](https://cloud.githubusercontent.com/assets/1000778/7601444/f943f8ac-f8ca-11e4-8280-a715d814f434.png)
![screen shot 2015-05-12 at 5 07 25 pm](https://cloud.githubusercontent.com/assets/1000778/7601445/f9571c0c-f8ca-11e4-9b12-9317cb55c002.png)

Author: zsxwing <zsxwing@gmail.com>

Closes #6102 from zsxwing/SPARK-7589 and squashes the following commits:

2745225 [zsxwing] Make "Input Rate" in the Streaming page consistent with other pages

(cherry picked from commit bec938f777)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-13 10:01:38 -07:00
Tathagata Das bb81b15005 [SPARK-7554] [STREAMING] Throw exception when an active/stopped StreamingContext is used to create DStreams and output operations
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6099 from tdas/SPARK-7554 and squashes the following commits:

2cd4158 [Tathagata Das] Throw exceptions on attempts to add stuff to active and stopped contexts.

(cherry picked from commit 23f7d66d51)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-12 17:07:46 -07:00
Tathagata Das 91fbd93f26 [SPARK-7553] [STREAMING] Added methods to maintain a singleton StreamingContext
In a REPL/notebook environment, its very easy to lose a reference to a StreamingContext by overriding the variable name. So if you happen to execute the following commands
```
val ssc = new StreamingContext(...) // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...) // accidentally run cmd 1 again
```
The value of ssc will be overwritten. Now you can neither start the new context (as only one context can be started), nor stop the previous context (as the reference is lost).
Hence its best to maintain a singleton reference to the active context, so that we never loose reference for the active context.
Since this problem occurs useful in REPL environments, its best to add this as an Experimental support in the Scala API only so that it can be used in Scala REPLs and notebooks.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6070 from tdas/SPARK-7553 and squashes the following commits:

731c9a1 [Tathagata Das] Fixed style
a797171 [Tathagata Das] Added more unit tests
19fc70b [Tathagata Das] Added :: Experimental :: in docs
64706c9 [Tathagata Das] Fixed test
634db5d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7553
3884a25 [Tathagata Das] Fixing test bug
d37a846 [Tathagata Das] Added getActive and getActiveOrCreate

(cherry picked from commit 00e7b09a0b)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-12 16:44:27 -07:00
zsxwing 221375ee1f [SPARK-7406] [STREAMING] [WEBUI] Add tooltips for "Scheduling Delay", "Processing Time" and "Total Delay"
Screenshots:
![screen shot 2015-05-06 at 2 29 03 pm](https://cloud.githubusercontent.com/assets/1000778/7504129/9c57f710-f3fc-11e4-9c6e-1b79c17c546d.png)

![screen shot 2015-05-06 at 2 24 35 pm](https://cloud.githubusercontent.com/assets/1000778/7504140/b63bb216-f3fc-11e4-83a5-6dfc6481d192.png)

tdas as we discussed offline

Author: zsxwing <zsxwing@gmail.com>

Closes #5952 from zsxwing/SPARK-7406 and squashes the following commits:

2b004ea [zsxwing] Merge branch 'master' into SPARK-7406
e9eb506 [zsxwing] Update tooltip contents
2215b2a [zsxwing] Add tooltips for "Scheduling Delay", "Processing Time" and "Total Delay"

(cherry picked from commit 1422e79e51)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-12 14:41:30 -07:00
Tathagata Das 2bbb685f4c [SPARK-7532] [STREAMING] StreamingContext.start() made to logWarning and not throw exception
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6060 from tdas/SPARK-7532 and squashes the following commits:

6fe2e83 [Tathagata Das] Update docs
7dadfc3 [Tathagata Das] Fixed bug again
99c7678 [Tathagata Das] Added logInfo
65aec20 [Tathagata Das] Fix bug
5bf031b [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7532
1a9a818 [Tathagata Das] Fix scaladoc
c584313 [Tathagata Das] StreamingContext.start() made to logWarning and not throw exception

(cherry picked from commit ec6f2a9774)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-12 08:48:34 -07:00
Marcelo Vanzin afe54b76a6 [SPARK-7485] [BUILD] Remove pyspark files from assembly.
The sbt part of the build is hacky; it basically tricks sbt
into generating the zip by using a generator, but returns
an empty list for the generated files so that nothing is
actually added to the assembly.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #6022 from vanzin/SPARK-7485 and squashes the following commits:

22c1e04 [Marcelo Vanzin] Remove unneeded code.
4893622 [Marcelo Vanzin] [SPARK-7485] [build] Remove pyspark files from assembly.

(cherry picked from commit 82e890fb19)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-12 01:39:28 -07:00
Tathagata Das c16b47f9ed [SPARK-7530] [STREAMING] Added StreamingContext.getState() to expose the current state of the context
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6058 from tdas/SPARK-7530 and squashes the following commits:

80ee0e6 [Tathagata Das] STARTED --> ACTIVE
3da6547 [Tathagata Das] Added synchronized
dd88444 [Tathagata Das] Added more docs
e1a8505 [Tathagata Das] Fixed comment length
89f9980 [Tathagata Das] Change to Java enum and added Java test
7c57351 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7530
dd4e702 [Tathagata Das] Addressed comments.
3d56106 [Tathagata Das] Added Mima excludes
2b86ba1 [Tathagata Das] Added scala docs.
1722433 [Tathagata Das] Fixed style
976b094 [Tathagata Das] Added license
0585130 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7530
e0f0a05 [Tathagata Das] Added getState and exposed StreamingContextState

(cherry picked from commit f9c7580ada)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-11 18:54:06 -07:00
jerryshao 9e226e1c8f [STREAMING] [MINOR] Close files correctly when iterator is finished in streaming WAL recovery
Currently there's no chance to close the file correctly after the iteration is finished, change to `CompletionIterator` to avoid resource leakage.

Author: jerryshao <saisai.shao@intel.com>

Closes #6050 from jerryshao/close-file-correctly and squashes the following commits:

52dfaf5 [jerryshao] Close files correctly when iterator is finished

(cherry picked from commit 25c01c5484)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-11 14:39:08 -07:00
Tathagata Das 11648fa011 [SPARK-7361] [STREAMING] Throw unambiguous exception when attempting to start multiple StreamingContexts in the same JVM
Currently attempt to start a streamingContext while another one is started throws a confusing exception that the action name JobScheduler is already registered. Instead its best to throw a proper exception as it is not supported.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #5907 from tdas/SPARK-7361 and squashes the following commits:

fb81c4a [Tathagata Das] Fix typo
a9cd5bb [Tathagata Das] Added startSite to StreamingContext
5fdfc0d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7361
5870e2b [Tathagata Das] Added check for multiple streaming contexts

(cherry picked from commit 1b46556999)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-11 10:59:10 -07:00
Wesley Miao da1be15cc6 [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time
tdas

https://issues.apache.org/jira/browse/SPARK-7326

The problem most likely resides in DStream.slice() implementation, as shown below.

  def slice(fromTime: Time, toTime: Time): Seq[RDD[T]] = {
    if (!isInitialized) {
      throw new SparkException(this + " has not been initialized")
    }
    if (!(fromTime - zeroTime).isMultipleOf(slideDuration)) {
      logWarning("fromTime (" + fromTime + ") is not a multiple of slideDuration ("
        + slideDuration + ")")
    }
    if (!(toTime - zeroTime).isMultipleOf(slideDuration)) {
      logWarning("toTime (" + fromTime + ") is not a multiple of slideDuration ("
        + slideDuration + ")")
    }
    val alignedToTime = toTime.floor(slideDuration, zeroTime)
    val alignedFromTime = fromTime.floor(slideDuration, zeroTime)

    logInfo("Slicing from " + fromTime + " to " + toTime +
      " (aligned to " + alignedFromTime + " and " + alignedToTime + ")")

    alignedFromTime.to(alignedToTime, slideDuration).flatMap(time => {
      if (time >= zeroTime) getOrCompute(time) else None
    })
  }

Here after performing floor() on both fromTime and toTime, the result (alignedFromTime - zeroTime) and (alignedToTime - zeroTime) may no longer be multiple of the slidingDuration, thus making isTimeValid() check failed for all the remaining computation.

The fix is to add a new floor() function in Time.scala to respect the zeroTime while performing the floor :

  def floor(that: Duration, zeroTime: Time): Time = {
    val t = that.milliseconds
    new Time(((this.millis - zeroTime.milliseconds) / t) * t + zeroTime.milliseconds)
  }

And then change the DStream.slice to call this new floor function by passing in its zeroTime.

    val alignedToTime = toTime.floor(slideDuration, zeroTime)
    val alignedFromTime = fromTime.floor(slideDuration, zeroTime)

This way the alignedToTime and alignedFromTime are *really* aligned in respect to zeroTime whose value is not really a 0.

Author: Wesley Miao <wesley.miao@gmail.com>
Author: Wesley <wesley.miao@autodesk.com>

Closes #5871 from wesleymiao/spark-7326 and squashes the following commits:

82a4d8c [Wesley Miao] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream dosen't work all the time
48b4dc0 [Wesley] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time
6ade399 [Wesley] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time
2611745 [Wesley Miao] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time

(cherry picked from commit d70a076892)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-05-11 12:20:21 +01:00
zsxwing 2e8a141b5a [SPARK-7305] [STREAMING] [WEBUI] Make BatchPage show friendly information when jobs are dropped by SparkListener
If jobs are dropped by SparkListener, at least we can show the job ids in BatchPage. Screenshot:

![b1](https://cloud.githubusercontent.com/assets/1000778/7434968/f19aa784-eff3-11e4-8f86-36a073873574.png)

Author: zsxwing <zsxwing@gmail.com>

Closes #5840 from zsxwing/SPARK-7305 and squashes the following commits:

aca0ba6 [zsxwing] Fix the code style
718765e [zsxwing] Make generateNormalJobRow private
8073b03 [zsxwing] Merge branch 'master' into SPARK-7305
83dec11 [zsxwing] Make BatchPage show friendly information when jobs are dropped by SparkListener

(cherry picked from commit 22ab70e06e)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-07 17:34:59 -07:00
Tathagata Das cb13c98b1a [SPARK-7217] [STREAMING] Add configuration to control the default behavior of StreamingContext.stop() implicitly calling SparkContext.stop()
In environments like notebooks, the SparkContext is managed by the underlying infrastructure and it is expected that the SparkContext will not be stopped. However, StreamingContext.stop() calls SparkContext.stop() as a non-intuitive side-effect. This PR adds a configuration in SparkConf that sets the default StreamingContext stop behavior. It should be such that the existing behavior does not change for existing users.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #5929 from tdas/SPARK-7217 and squashes the following commits:

869a763 [Tathagata Das] Changed implementation.
685fe00 [Tathagata Das] Added configuration

(cherry picked from commit 01187f59b3)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-07 00:27:02 -07:00
Tathagata Das 065d114c6d [SPARK-7430] [STREAMING] [TEST] General improvements to streaming tests to increase debuggability
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #5961 from tdas/SPARK-7430 and squashes the following commits:

d654978 [Tathagata Das] Fix scala style
fbf7174 [Tathagata Das] Added more verbose assert failure messages.
6aea07a [Tathagata Das] Ensure SynchronizedBuffer is used in every TestSuiteBase

(cherry picked from commit cfdadcbd2b)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-07 00:21:27 -07:00
zsxwing d6e76cb0d9 [SPARK-7405] [STREAMING] Fix the bug that ReceiverInputDStream doesn't report InputInfo
The bug is because SPARK-7139 removed some codes from SPARK-7112 unintentionally here: 1854ac326a (diff-5c8651dd78abd20439b8eb938175075dL72)

This PR just added them back and added some assertions in the tests to verify it.

Author: zsxwing <zsxwing@gmail.com>

Closes #5950 from zsxwing/SPARK-7405 and squashes the following commits:

675f5d9 [zsxwing] Fix the bug that ReceiverInputDStream doesn't report InputInfo

(cherry picked from commit 14502d5e56)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-06 18:07:11 -07:00
zsxwing 8109c9e105 [SPARK-6939] [STREAMING] [WEBUI] Add timeline and histogram graphs for streaming statistics
This is the initial work of SPARK-6939. Not yet ready for code review. Here are the screenshots:

![graph1](https://cloud.githubusercontent.com/assets/1000778/7165766/465942e0-e3dc-11e4-9b05-c184b09d75dc.png)

![graph2](https://cloud.githubusercontent.com/assets/1000778/7165779/53f13f34-e3dc-11e4-8714-a4a75b7e09ff.png)

TODOs:
- [x] Display more information on mouse hover
- [x] Align the timeline and distribution graphs
- [x] Clean up the codes

Author: zsxwing <zsxwing@gmail.com>

Closes #5533 from zsxwing/SPARK-6939 and squashes the following commits:

9f7cd19 [zsxwing] Merge branch 'master' into SPARK-6939
deacc3f [zsxwing] Remove unused import
cd03424 [zsxwing] Fix .rat-excludes
70cc87d [zsxwing] Streaming Scheduling Delay => Scheduling Delay
d457277 [zsxwing] Fix UIUtils in BatchPage
b3f303e [zsxwing] Add comments for unclear classes and methods
ff0bff8 [zsxwing] Make InputDStream.name private[streaming]
cc392c5 [zsxwing] Merge branch 'master' into SPARK-6939
e275e23 [zsxwing] Move time related methods to Streaming's UIUtils
d5d86f6 [zsxwing] Fix incorrect lastErrorTime
3be4b7a [zsxwing] Use InputInfo
b50fa32 [zsxwing] Jump to the batch page when clicking a point in the timeline graphs
203605d [zsxwing] Merge branch 'master' into SPARK-6939
74307cf [zsxwing] Reuse the data for histogram graphs to reduce the page size
2586916 [zsxwing] Merge branch 'master' into SPARK-6939
70d8533 [zsxwing] Remove BatchInfo.numRecords and a few renames
7bbdc0a [zsxwing] Hide the receiver sub table if no receiver
a2972e9 [zsxwing] Add some ui tests for StreamingPage
fd03ad0 [zsxwing] Add a test to verify no memory leak
4a8f886 [zsxwing] Merge branch 'master' into SPARK-6939
18607a1 [zsxwing] Merge branch 'master' into SPARK-6939
d0b0aec [zsxwing] Clean up the codes
a459f49 [zsxwing] Add a dash line to processing time graphs
8e4363c [zsxwing] Prepare for the demo
c81a1ee [zsxwing] Change time unit in the graphs automatically
4c0b43f [zsxwing] Update Streaming UI
04c7500 [zsxwing] Make the server and client use the same timezone
fed8219 [zsxwing] Move the x axis at the top and show a better tooltip
c23ce10 [zsxwing] Make two graphs close
d78672a [zsxwing] Make the X axis use the same range
881c907 [zsxwing] Use histogram for distribution
5688702 [zsxwing] Fix the unit test
ddf741a [zsxwing] Fix the unit test
ad93295 [zsxwing] Remove unnecessary codes
a0458f9 [zsxwing] Clean the codes
b82ed1e [zsxwing] Update the graphs as per comments
dd653a1 [zsxwing] Add timeline and histogram graphs for streaming statistics

(cherry picked from commit 489700c809)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-05 12:52:29 -07:00
Andrew Or acc877a989 [SPARK-7318] [STREAMING] DStream cleans objects that are not closures
I added a check in `ClosureCleaner#clean` to fail fast if this is detected in the future. tdas

Author: Andrew Or <andrew@databricks.com>

Closes #5860 from andrewor14/streaming-closure-cleaner and squashes the following commits:

8e971d7 [Andrew Or] Do not throw exception if object to clean is not closure
5ee4e25 [Andrew Or] Fix tests
eed3390 [Andrew Or] Merge branch 'master' of github.com:apache/spark into streaming-closure-cleaner
67eeff4 [Andrew Or] Add tests
a4fa768 [Andrew Or] Clean the closure, not the RDD

(cherry picked from commit 57e9f29e17)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-05-05 09:37:56 -07:00