4a17eedb16
For SPARK-5867: * The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API. * It should also include Python examples now. For SPARK-5892: * Fix Python docs * Various other cleanups BTW, I accidentally merged this with master. If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check] CC: mengxr (ML), davies (Python docs) Author: Joseph K. Bradley <joseph@databricks.com> Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits: f191bb0 [Joseph K. Bradley] small cleanups e786efa [Joseph K. Bradley] small doc corrections 6b1ab4a [Joseph K. Bradley] fixed python lint test 946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example. Changed spark.ml Java examples to use DataFrames API instead of sql() da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3 629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation 34b067f [Joseph K. Bradley] small doc correction da16aef [Joseph K. Bradley] Fixed python mllib docs 8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc 695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs b05a80d [Joseph K. Bradley] organize imports. doc cleanups e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
68 lines
3.6 KiB
Markdown
68 lines
3.6 KiB
Markdown
---
|
|
layout: global
|
|
title: Old Migration Guides - MLlib
|
|
displayTitle: <a href="mllib-guide.html">MLlib</a> - Old Migration Guides
|
|
description: MLlib migration guides from before Spark SPARK_VERSION_SHORT
|
|
---
|
|
|
|
The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide).
|
|
|
|
## From 1.1 to 1.2
|
|
|
|
The only API changes in MLlib v1.2 are in
|
|
[`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
|
|
which continues to be an experimental API in MLlib 1.2:
|
|
|
|
1. *(Breaking change)* The Scala API for classification takes a named argument specifying the number
|
|
of classes. In MLlib v1.1, this argument was called `numClasses` in Python and
|
|
`numClassesForClassification` in Scala. In MLlib v1.2, the names are both set to `numClasses`.
|
|
This `numClasses` parameter is specified either via
|
|
[`Strategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.Strategy)
|
|
or via [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree)
|
|
static `trainClassifier` and `trainRegressor` methods.
|
|
|
|
2. *(Breaking change)* The API for
|
|
[`Node`](api/scala/index.html#org.apache.spark.mllib.tree.model.Node) has changed.
|
|
This should generally not affect user code, unless the user manually constructs decision trees
|
|
(instead of using the `trainClassifier` or `trainRegressor` methods).
|
|
The tree `Node` now includes more information, including the probability of the predicted label
|
|
(for classification).
|
|
|
|
3. Printing methods' output has changed. The `toString` (Scala/Java) and `__repr__` (Python) methods used to print the full model; they now print a summary. For the full model, use `toDebugString`.
|
|
|
|
Examples in the Spark distribution and examples in the
|
|
[Decision Trees Guide](mllib-decision-tree.html#examples) have been updated accordingly.
|
|
|
|
## From 1.0 to 1.1
|
|
|
|
The only API changes in MLlib v1.1 are in
|
|
[`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
|
|
which continues to be an experimental API in MLlib 1.1:
|
|
|
|
1. *(Breaking change)* The meaning of tree depth has been changed by 1 in order to match
|
|
the implementations of trees in
|
|
[scikit-learn](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.tree)
|
|
and in [rpart](http://cran.r-project.org/web/packages/rpart/index.html).
|
|
In MLlib v1.0, a depth-1 tree had 1 leaf node, and a depth-2 tree had 1 root node and 2 leaf nodes.
|
|
In MLlib v1.1, a depth-0 tree has 1 leaf node, and a depth-1 tree has 1 root node and 2 leaf nodes.
|
|
This depth is specified by the `maxDepth` parameter in
|
|
[`Strategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.Strategy)
|
|
or via [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree)
|
|
static `trainClassifier` and `trainRegressor` methods.
|
|
|
|
2. *(Non-breaking change)* We recommend using the newly added `trainClassifier` and `trainRegressor`
|
|
methods to build a [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
|
|
rather than using the old parameter class `Strategy`. These new training methods explicitly
|
|
separate classification and regression, and they replace specialized parameter types with
|
|
simple `String` types.
|
|
|
|
Examples of the new, recommended `trainClassifier` and `trainRegressor` are given in the
|
|
[Decision Trees Guide](mllib-decision-tree.html#examples).
|
|
|
|
## From 0.9 to 1.0
|
|
|
|
In MLlib v1.0, we support both dense and sparse input in a unified way, which introduces a few
|
|
breaking changes. If your data is sparse, please store it in a sparse format instead of dense to
|
|
take advantage of sparsity in both storage and computation. Details are described below.
|
|
|