[SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6
No known breaking changes, but some deprecations and changes of behavior. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #10235 from jkbradley/mllib-guide-update-1.6.
This commit is contained in:
parent
6a880afa83
commit
8148cc7a5c
|
@ -74,7 +74,7 @@ We list major functionality from both below, with links to detailed guides.
|
||||||
* [Advanced topics](ml-advanced.html)
|
* [Advanced topics](ml-advanced.html)
|
||||||
|
|
||||||
Some techniques are not available yet in spark.ml, most notably dimensionality reduction
|
Some techniques are not available yet in spark.ml, most notably dimensionality reduction
|
||||||
Users can seemlessly combine the implementation of these techniques found in `spark.mllib` with the rest of the algorithms found in `spark.ml`.
|
Users can seamlessly combine the implementation of these techniques found in `spark.mllib` with the rest of the algorithms found in `spark.ml`.
|
||||||
|
|
||||||
# Dependencies
|
# Dependencies
|
||||||
|
|
||||||
|
@ -101,24 +101,32 @@ MLlib is under active development.
|
||||||
The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
|
The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
|
||||||
and the migration guide below will explain all changes between releases.
|
and the migration guide below will explain all changes between releases.
|
||||||
|
|
||||||
## From 1.4 to 1.5
|
## From 1.5 to 1.6
|
||||||
|
|
||||||
In the `spark.mllib` package, there are no break API changes but several behavior changes:
|
There are no breaking API changes in the `spark.mllib` or `spark.ml` packages, but there are
|
||||||
|
deprecations and changes of behavior.
|
||||||
|
|
||||||
* [SPARK-9005](https://issues.apache.org/jira/browse/SPARK-9005):
|
Deprecations:
|
||||||
`RegressionMetrics.explainedVariance` returns the average regression sum of squares.
|
|
||||||
* [SPARK-8600](https://issues.apache.org/jira/browse/SPARK-8600): `NaiveBayesModel.labels` become
|
|
||||||
sorted.
|
|
||||||
* [SPARK-3382](https://issues.apache.org/jira/browse/SPARK-3382): `GradientDescent` has a default
|
|
||||||
convergence tolerance `1e-3`, and hence iterations might end earlier than 1.4.
|
|
||||||
|
|
||||||
In the `spark.ml` package, there exists one break API change and one behavior change:
|
* [SPARK-11358](https://issues.apache.org/jira/browse/SPARK-11358):
|
||||||
|
In `spark.mllib.clustering.KMeans`, the `runs` parameter has been deprecated.
|
||||||
|
* [SPARK-10592](https://issues.apache.org/jira/browse/SPARK-10592):
|
||||||
|
In `spark.ml.classification.LogisticRegressionModel` and
|
||||||
|
`spark.ml.regression.LinearRegressionModel`, the `weights` field has been deprecated in favor of
|
||||||
|
the new name `coefficients`. This helps disambiguate from instance (row) "weights" given to
|
||||||
|
algorithms.
|
||||||
|
|
||||||
* [SPARK-9268](https://issues.apache.org/jira/browse/SPARK-9268): Java's varargs support is removed
|
Changes of behavior:
|
||||||
from `Params.setDefault` due to a
|
|
||||||
[Scala compiler bug](https://issues.scala-lang.org/browse/SI-9013).
|
* [SPARK-7770](https://issues.apache.org/jira/browse/SPARK-7770):
|
||||||
* [SPARK-10097](https://issues.apache.org/jira/browse/SPARK-10097): `Evaluator.isLargerBetter` is
|
`spark.mllib.tree.GradientBoostedTrees`: `validationTol` has changed semantics in 1.6.
|
||||||
added to indicate metric ordering. Metrics like RMSE no longer flip signs as in 1.4.
|
Previously, it was a threshold for absolute change in error. Now, it resembles the behavior of
|
||||||
|
`GradientDescent`'s `convergenceTol`: For large errors, it uses relative error (relative to the
|
||||||
|
previous error); for small errors (`< 0.01`), it uses absolute error.
|
||||||
|
* [SPARK-11069](https://issues.apache.org/jira/browse/SPARK-11069):
|
||||||
|
`spark.ml.feature.RegexTokenizer`: Previously, it did not convert strings to lowercase before
|
||||||
|
tokenizing. Now, it converts to lowercase by default, with an option not to. This matches the
|
||||||
|
behavior of the simpler `Tokenizer` transformer.
|
||||||
|
|
||||||
## Previous Spark versions
|
## Previous Spark versions
|
||||||
|
|
||||||
|
|
|
@ -7,6 +7,25 @@ description: MLlib migration guides from before Spark SPARK_VERSION_SHORT
|
||||||
|
|
||||||
The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide).
|
The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide).
|
||||||
|
|
||||||
|
## From 1.4 to 1.5
|
||||||
|
|
||||||
|
In the `spark.mllib` package, there are no breaking API changes but several behavior changes:
|
||||||
|
|
||||||
|
* [SPARK-9005](https://issues.apache.org/jira/browse/SPARK-9005):
|
||||||
|
`RegressionMetrics.explainedVariance` returns the average regression sum of squares.
|
||||||
|
* [SPARK-8600](https://issues.apache.org/jira/browse/SPARK-8600): `NaiveBayesModel.labels` become
|
||||||
|
sorted.
|
||||||
|
* [SPARK-3382](https://issues.apache.org/jira/browse/SPARK-3382): `GradientDescent` has a default
|
||||||
|
convergence tolerance `1e-3`, and hence iterations might end earlier than 1.4.
|
||||||
|
|
||||||
|
In the `spark.ml` package, there exists one breaking API change and one behavior change:
|
||||||
|
|
||||||
|
* [SPARK-9268](https://issues.apache.org/jira/browse/SPARK-9268): Java's varargs support is removed
|
||||||
|
from `Params.setDefault` due to a
|
||||||
|
[Scala compiler bug](https://issues.scala-lang.org/browse/SI-9013).
|
||||||
|
* [SPARK-10097](https://issues.apache.org/jira/browse/SPARK-10097): `Evaluator.isLargerBetter` is
|
||||||
|
added to indicate metric ordering. Metrics like RMSE no longer flip signs as in 1.4.
|
||||||
|
|
||||||
## From 1.3 to 1.4
|
## From 1.3 to 1.4
|
||||||
|
|
||||||
In the `spark.mllib` package, there were several breaking changes, but all in `DeveloperApi` or `Experimental` APIs:
|
In the `spark.mllib` package, there were several breaking changes, but all in `DeveloperApi` or `Experimental` APIs:
|
||||||
|
|
Loading…
Reference in a new issue