spark-instrumented-optimizer/sql/core
Bruce Robbins f34b872aed [SPARK-26851][SQL] Fix double-checked locking in CachedRDDBuilder
## What changes were proposed in this pull request?

According to Brian Goetz et al in Java Concurrency in Practice, the double checked locking pattern has worked since Java 5, but only if the resource is declared volatile:

> Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if resource is made volatile, and the performance impact of this is small since volatile reads are usually only slightly more expensive than nonvolatile reads.

CachedRDDBuilder. cachedColumnBuffers and CachedRDDBuilder.clearCache both use DCL to manage the resource ``_cachedColumnBuffers``. The missing ingredient is that ``_cachedColumnBuffers`` is not volatile.

Because of this, clearCache may see ``_cachedColumnBuffers`` as null, when in fact it is not, and therefore fail to un-cache the RDD. There may be other, more subtle bugs due to visibility issues.

To avoid these issues, this PR makes ``_cachedColumnBuffers`` volatile.

## How was this patch tested?

- Existing SQL unit tests
- Existing pyspark-sql tests

Closes #23768 from bersprockets/SPARK-26851.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-02-14 14:57:25 +08:00
..
benchmarks [SPARK-26745][SQL] Revert count optimization in JSON datasource by SPARK-24959 2019-01-31 14:32:31 +08:00
src [SPARK-26851][SQL] Fix double-checked locking in CachedRDDBuilder 2019-02-14 14:57:25 +08:00
pom.xml [SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 2018-11-14 16:22:23 -08:00