[SPARK-30124][MLLIB] unnecessary persist in PythonMLLibAPI.scala

### What changes were proposed in this pull request?
Removed unnecessary persist.

### Why are the changes needed?
Persist in `PythonMLLibAPI.scala` is unnecessary because later in `run()` of `gmmAlg` is caching the data.
710ddab39e/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala (L167-L171)

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Manually

Closes #26758 from amanomer/improperPersist.

Authored-by: Aman Omer <amanomer1996@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
This commit is contained in:
Aman Omer 2019-12-05 11:54:45 -06:00 committed by Sean Owen
parent 35bab33984
commit 5892bbf447
2 changed files with 2 additions and 5 deletions

View file

@ -407,11 +407,7 @@ private[python] class PythonMLLibAPI extends Serializable {
if (seed != null) gmmAlg.setSeed(seed)
try {
new GaussianMixtureModelWrapper(gmmAlg.run(data.rdd.persist(StorageLevel.MEMORY_AND_DISK)))
} finally {
data.rdd.unpersist()
}
new GaussianMixtureModelWrapper(gmmAlg.run(data.rdd))
}
/**

View file

@ -234,6 +234,7 @@ class GaussianMixture private (
iter += 1
compute.destroy()
}
breezeData.unpersist()
new GaussianMixtureModel(weights, gaussians)
}