[SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6

No known breaking changes, but some deprecations and changes of behavior.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #10235 from jkbradley/mllib-guide-update-1.6.
This commit is contained in:
Joseph K. Bradley 2015-12-16 11:53:04 -08:00
parent 6a880afa83
commit 8148cc7a5c
2 changed files with 42 additions and 15 deletions

View file

@ -74,7 +74,7 @@ We list major functionality from both below, with links to detailed guides.
* [Advanced topics](ml-advanced.html) * [Advanced topics](ml-advanced.html)
Some techniques are not available yet in spark.ml, most notably dimensionality reduction Some techniques are not available yet in spark.ml, most notably dimensionality reduction
Users can seemlessly combine the implementation of these techniques found in `spark.mllib` with the rest of the algorithms found in `spark.ml`. Users can seamlessly combine the implementation of these techniques found in `spark.mllib` with the rest of the algorithms found in `spark.ml`.
# Dependencies # Dependencies
@ -101,24 +101,32 @@ MLlib is under active development.
The APIs marked `Experimental`/`DeveloperApi` may change in future releases, The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
and the migration guide below will explain all changes between releases. and the migration guide below will explain all changes between releases.
## From 1.4 to 1.5 ## From 1.5 to 1.6
In the `spark.mllib` package, there are no break API changes but several behavior changes: There are no breaking API changes in the `spark.mllib` or `spark.ml` packages, but there are
deprecations and changes of behavior.
* [SPARK-9005](https://issues.apache.org/jira/browse/SPARK-9005): Deprecations:
`RegressionMetrics.explainedVariance` returns the average regression sum of squares.
* [SPARK-8600](https://issues.apache.org/jira/browse/SPARK-8600): `NaiveBayesModel.labels` become
sorted.
* [SPARK-3382](https://issues.apache.org/jira/browse/SPARK-3382): `GradientDescent` has a default
convergence tolerance `1e-3`, and hence iterations might end earlier than 1.4.
In the `spark.ml` package, there exists one break API change and one behavior change: * [SPARK-11358](https://issues.apache.org/jira/browse/SPARK-11358):
In `spark.mllib.clustering.KMeans`, the `runs` parameter has been deprecated.
* [SPARK-10592](https://issues.apache.org/jira/browse/SPARK-10592):
In `spark.ml.classification.LogisticRegressionModel` and
`spark.ml.regression.LinearRegressionModel`, the `weights` field has been deprecated in favor of
the new name `coefficients`. This helps disambiguate from instance (row) "weights" given to
algorithms.
* [SPARK-9268](https://issues.apache.org/jira/browse/SPARK-9268): Java's varargs support is removed Changes of behavior:
from `Params.setDefault` due to a
[Scala compiler bug](https://issues.scala-lang.org/browse/SI-9013). * [SPARK-7770](https://issues.apache.org/jira/browse/SPARK-7770):
* [SPARK-10097](https://issues.apache.org/jira/browse/SPARK-10097): `Evaluator.isLargerBetter` is `spark.mllib.tree.GradientBoostedTrees`: `validationTol` has changed semantics in 1.6.
added to indicate metric ordering. Metrics like RMSE no longer flip signs as in 1.4. Previously, it was a threshold for absolute change in error. Now, it resembles the behavior of
`GradientDescent`'s `convergenceTol`: For large errors, it uses relative error (relative to the
previous error); for small errors (`< 0.01`), it uses absolute error.
* [SPARK-11069](https://issues.apache.org/jira/browse/SPARK-11069):
`spark.ml.feature.RegexTokenizer`: Previously, it did not convert strings to lowercase before
tokenizing. Now, it converts to lowercase by default, with an option not to. This matches the
behavior of the simpler `Tokenizer` transformer.
## Previous Spark versions ## Previous Spark versions

View file

@ -7,6 +7,25 @@ description: MLlib migration guides from before Spark SPARK_VERSION_SHORT
The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide). The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide).
## From 1.4 to 1.5
In the `spark.mllib` package, there are no breaking API changes but several behavior changes:
* [SPARK-9005](https://issues.apache.org/jira/browse/SPARK-9005):
`RegressionMetrics.explainedVariance` returns the average regression sum of squares.
* [SPARK-8600](https://issues.apache.org/jira/browse/SPARK-8600): `NaiveBayesModel.labels` become
sorted.
* [SPARK-3382](https://issues.apache.org/jira/browse/SPARK-3382): `GradientDescent` has a default
convergence tolerance `1e-3`, and hence iterations might end earlier than 1.4.
In the `spark.ml` package, there exists one breaking API change and one behavior change:
* [SPARK-9268](https://issues.apache.org/jira/browse/SPARK-9268): Java's varargs support is removed
from `Params.setDefault` due to a
[Scala compiler bug](https://issues.scala-lang.org/browse/SI-9013).
* [SPARK-10097](https://issues.apache.org/jira/browse/SPARK-10097): `Evaluator.isLargerBetter` is
added to indicate metric ordering. Metrics like RMSE no longer flip signs as in 1.4.
## From 1.3 to 1.4 ## From 1.3 to 1.4
In the `spark.mllib` package, there were several breaking changes, but all in `DeveloperApi` or `Experimental` APIs: In the `spark.mllib` package, there were several breaking changes, but all in `DeveloperApi` or `Experimental` APIs: