spark-instrumented-optimizer/mllib
Peter Rudenko d51d6ba154 [Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop
On a big dataset explicitly unpersist train and validation folds allows to load more data into memory in the next loop iteration. On my environment (single node 8Gb worker RAM, 2 GB dataset file, 3 folds for cross validation), saved more than 5 minutes.

Author: Peter Rudenko <petro.rudenko@gmail.com>

Closes #4595 from petro-rudenko/patch-2 and squashes the following commits:

66a7cfb [Peter Rudenko] Move validationDataset cache to declaration
c5f3265 [Peter Rudenko] [Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop
2015-02-16 00:07:23 -08:00
..
src [Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop 2015-02-16 00:07:23 -08:00
pom.xml [SPARK-4259][MLlib]: Add Power Iteration Clustering Algorithm with Gaussian Similarity Function 2015-01-30 14:09:49 -08:00