spark-instrumented-optimizer/project
Joseph K. Bradley 953ff897e4 [SPARK-13048][ML][MLLIB] keepLastCheckpoint option for LDA EM optimizer
## What changes were proposed in this pull request?

The EMLDAOptimizer should generally not delete its last checkpoint since that can cause failures when DistributedLDAModel methods are called (if any partitions need to be recovered from the checkpoint).

This PR adds a "deleteLastCheckpoint" option which defaults to false.  This is a change in behavior from Spark 1.6, in that the last checkpoint will not be removed by default.

This involves adding the deleteLastCheckpoint option to both spark.ml and spark.mllib, and modifying PeriodicCheckpointer to support the option.

This also:
* Makes MLlibTestSparkContext extend TempDirectory and set the checkpointDir to tempDir
* Updates LibSVMRelationSuite because of a name conflict with "tempDir" (and fixes a bug where it failed to delete a temp directory)
* Adds a MIMA exclude for DistributedLDAModel constructor, which is already ```private[clustering]```

## How was this patch tested?

Added 2 new unit tests to spark.ml LDASuite, which calls into spark.mllib.

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #12166 from jkbradley/emlda-save-checkpoint.
2016-04-07 19:48:33 -07:00
..
project [MINOR][BUILD] Changed the comment to reflect the plugin project is there to support SBT pom reader only. 2015-11-30 09:30:58 +00:00
build.properties [SPARK-13834][BUILD] Update sbt and sbt plugins for 2.x. 2016-03-13 18:47:04 -07:00
MimaBuild.scala [SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter 2016-03-26 01:47:27 -07:00
MimaExcludes.scala [SPARK-13048][ML][MLLIB] keepLastCheckpoint option for LDA EM optimizer 2016-04-07 19:48:33 -07:00
plugins.sbt [SPARK-14366] Remove sbt-idea plugin 2016-04-04 16:55:59 -07:00
SparkBuild.scala [SPARK-529][SQL] Modify SQLConf to use new config API from core. 2016-04-05 15:19:51 -07:00