spark-instrumented-optimizer

History

Shixiong Zhu 768b3d623c [SPARK-14579][SQL] Fix a race condition in StreamExecution.processAllAvailable ## What changes were proposed in this pull request? There is a race condition in `StreamExecution.processAllAvailable`. Here is an execution order to reproduce it. \| Time \|Thread 1 \| MicroBatchThread \| \|:-------------:\|:-------------:\|:-----:\| \| 1 \| \| `dataAvailable in constructNextBatch` returns false \| \| 2 \| addData(newData) \| \| \| 3 \| `noNewData = false` in processAllAvailable \| \| \| 4 \| \| noNewData = true \| \| 5 \| `noNewData` is true so just return \| \| The root cause is that `checking dataAvailable and change noNewData to true` is not atomic. This PR puts these two actions into `synchronized` to make sure they are atomic. In addition, this PR also has the following changes: - Make `committedOffsets` and `availableOffsets` volatile to make sure they can be seen in other threads. - Copy the reference of `availableOffsets` to a local variable so that `sourceStatuses` can use a snapshot of `availableOffsets`. ## How was this patch tested? Existing unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #12339 from zsxwing/race-condition.		2016-04-12 17:31:47 -07:00
..
java/org/apache/spark/sql	[SPARK-14465][BUILD] Checkstyle should check all Java files	2016-04-09 21:31:20 -07:00
resources	[SPARK-12902] [SQL] visualization for generated operators	2016-01-25 12:44:20 -08:00
scala/org/apache/spark/sql	[SPARK-14579][SQL] Fix a race condition in StreamExecution.processAllAvailable	2016-04-12 17:31:47 -07:00