[SPARK-30124][MLLIB] unnecessary persist in PythonMLLibAPI.scala

### What changes were proposed in this pull request? Removed unnecessary persist. ### Why are the changes needed? Persist in `PythonMLLibAPI.scala` is unnecessary because later in `run()` of `gmmAlg` is caching the data. 710ddab39e/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala (L167-L171) ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually Closes #26758 from amanomer/improperPersist. Authored-by: Aman Omer <amanomer1996@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-12-05 11:54:45 -06:00 · 2019-12-05 11:54:45 -06:00 · 5892bbf447
parent 35bab33984
commit 5892bbf447
2 changed files with 2 additions and 5 deletions
--- a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
@ -407,11 +407,7 @@ private[python] class PythonMLLibAPI extends Serializable {

    if (seed != null) gmmAlg.setSeed(seed)

-    try {
-      new GaussianMixtureModelWrapper(gmmAlg.run(data.rdd.persist(StorageLevel.MEMORY_AND_DISK)))
-    } finally {
-      data.rdd.unpersist()
-    }
+    new GaussianMixtureModelWrapper(gmmAlg.run(data.rdd))
  }

  /**
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala
@ -234,6 +234,7 @@ class GaussianMixture private (
      iter += 1
      compute.destroy()
    }
+    breezeData.unpersist()

    new GaussianMixtureModel(weights, gaussians)
  }