spark-instrumented-optimizer/streaming
Tathagata Das 5d80d8c6a5 [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present
The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004).

While #9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #9988 from tdas/SPARK-11932.
2015-12-07 11:03:59 -08:00
..
src [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present 2015-12-07 11:03:59 -08:00
pom.xml Add mockito as an explicit test dependency to spark-streaming 2015-11-09 18:53:57 -08:00