[SPARK-7100] [MLLIB] Fix persisted RDD leak in GradientBoostTrees

This fixes a leak of a persisted RDD where GradientBoostTrees can call persist but never unpersists.

Jira: https://issues.apache.org/jira/browse/SPARK-7100

Discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/GradientBoostTrees-leaks-a-persisted-RDD-td11750.html

Author: Jim Carroll <jim@dontcallme.com>

Closes #5669 from jimfcarroll/gb-unpersist-fix and squashes the following commits:

45f4b03 [Jim Carroll] [SPARK-7100][MLLib] Fix persisted RDD leak in GradientBoostTrees
This commit is contained in:
Jim Carroll 2015-04-28 07:51:02 -04:00 committed by Sean Owen
parent 7f3b3b7eb7
commit 75905c57cd

View file

@ -177,9 +177,10 @@ object GradientBoostedTrees extends Logging {
treeStrategy.assertValid()
// Cache input
if (input.getStorageLevel == StorageLevel.NONE) {
val persistedInput = if (input.getStorageLevel == StorageLevel.NONE) {
input.persist(StorageLevel.MEMORY_AND_DISK)
}
true
} else false
timer.stop("init")
@ -265,6 +266,9 @@ object GradientBoostedTrees extends Logging {
logInfo("Internal timing for DecisionTree:")
logInfo(s"$timer")
if (persistedInput) input.unpersist()
if (validate) {
new GradientBoostedTreesModel(
boostingStrategy.treeStrategy.algo,