spark-instrumented-optimizer/docs/mllib-migration-guides.md

---
layout: global
title: Old Migration Guides - MLlib
displayTitle: <a href="mllib-guide.html">MLlib</a> - Old Migration Guides
description: MLlib migration guides from before Spark SPARK_VERSION_SHORT
---

The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide).

## From 1.1 to 1.2

The only API changes in MLlib v1.2 are in
[`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
which continues to be an experimental API in MLlib 1.2:

1. *(Breaking change)* The Scala API for classification takes a named argument specifying the number
of classes.  In MLlib v1.1, this argument was called `numClasses` in Python and
`numClassesForClassification` in Scala.  In MLlib v1.2, the names are both set to `numClasses`.
This `numClasses` parameter is specified either via
[`Strategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.Strategy)
or via [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree)
static `trainClassifier` and `trainRegressor` methods.

2. *(Breaking change)* The API for
[`Node`](api/scala/index.html#org.apache.spark.mllib.tree.model.Node) has changed.
This should generally not affect user code, unless the user manually constructs decision trees
(instead of using the `trainClassifier` or `trainRegressor` methods).
The tree `Node` now includes more information, including the probability of the predicted label
(for classification).

3. Printing methods' output has changed.  The `toString` (Scala/Java) and `__repr__` (Python) methods used to print the full model; they now print a summary.  For the full model, use `toDebugString`.

Examples in the Spark distribution and examples in the
[Decision Trees Guide](mllib-decision-tree.html#examples) have been updated accordingly.

## From 1.0 to 1.1

The only API changes in MLlib v1.1 are in
[`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
which continues to be an experimental API in MLlib 1.1:

1. *(Breaking change)* The meaning of tree depth has been changed by 1 in order to match
the implementations of trees in
[scikit-learn](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.tree)
and in [rpart](http://cran.r-project.org/web/packages/rpart/index.html).
In MLlib v1.0, a depth-1 tree had 1 leaf node, and a depth-2 tree had 1 root node and 2 leaf nodes.
In MLlib v1.1, a depth-0 tree has 1 leaf node, and a depth-1 tree has 1 root node and 2 leaf nodes.
This depth is specified by the `maxDepth` parameter in
[`Strategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.Strategy)
or via [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree)
static `trainClassifier` and `trainRegressor` methods.

2. *(Non-breaking change)* We recommend using the newly added `trainClassifier` and `trainRegressor`
methods to build a [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
rather than using the old parameter class `Strategy`.  These new training methods explicitly
separate classification and regression, and they replace specialized parameter types with
simple `String` types.

Examples of the new, recommended `trainClassifier` and `trainRegressor` are given in the
[Decision Trees Guide](mllib-decision-tree.html#examples).

## From 0.9 to 1.0

In MLlib v1.0, we support both dense and sparse input in a unified way, which introduces a few
breaking changes.  If your data is sparse, please store it in a sparse format instead of dense to
take advantage of sparsity in both storage and computation. Details are described below.
[SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release For SPARK-5867: * The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API. * It should also include Python examples now. For SPARK-5892: * Fix Python docs * Various other cleanups BTW, I accidentally merged this with master. If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check] CC: mengxr (ML), davies (Python docs) Author: Joseph K. Bradley <joseph@databricks.com> Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits: f191bb0 [Joseph K. Bradley] small cleanups e786efa [Joseph K. Bradley] small doc corrections 6b1ab4a [Joseph K. Bradley] fixed python lint test 946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example. Changed spark.ml Java examples to use DataFrames API instead of sql() da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3 629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation 34b067f [Joseph K. Bradley] small doc correction da16aef [Joseph K. Bradley] Fixed python mllib docs 8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc 695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs b05a80d [Joseph K. Bradley] organize imports. doc cleanups e572827 [Joseph K. Bradley] updated programming guide for ml and mllib 2015-02-20 05:31:32 -05:00			`---`
			`layout: global`
			`title: Old Migration Guides - MLlib`
			`displayTitle: <a href="mllib-guide.html">MLlib</a> - Old Migration Guides`
			`description: MLlib migration guides from before Spark SPARK_VERSION_SHORT`
			`---`

			`The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide).`

			`## From 1.1 to 1.2`

			`The only API changes in MLlib v1.2 are in`
			[`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
			`which continues to be an experimental API in MLlib 1.2:`

			`1. (Breaking change) The Scala API for classification takes a named argument specifying the number`
			of classes. In MLlib v1.1, this argument was called `numClasses` in Python and
			`numClassesForClassification` in Scala. In MLlib v1.2, the names are both set to `numClasses`.
			This `numClasses` parameter is specified either via
			[`Strategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.Strategy)
			or via [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree)
			static `trainClassifier` and `trainRegressor` methods.

			`2. (Breaking change) The API for`
			[`Node`](api/scala/index.html#org.apache.spark.mllib.tree.model.Node) has changed.
			`This should generally not affect user code, unless the user manually constructs decision trees`
			(instead of using the `trainClassifier` or `trainRegressor` methods).
			The tree `Node` now includes more information, including the probability of the predicted label
			`(for classification).`

			3. Printing methods' output has changed. The `toString` (Scala/Java) and `__repr__` (Python) methods used to print the full model; they now print a summary. For the full model, use `toDebugString`.

			`Examples in the Spark distribution and examples in the`
			`[Decision Trees Guide](mllib-decision-tree.html#examples) have been updated accordingly.`

			`## From 1.0 to 1.1`

			`The only API changes in MLlib v1.1 are in`
			[`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
			`which continues to be an experimental API in MLlib 1.1:`

			`1. (Breaking change) The meaning of tree depth has been changed by 1 in order to match`
			`the implementations of trees in`
			`[scikit-learn](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.tree)`
			`and in [rpart](http://cran.r-project.org/web/packages/rpart/index.html).`
			`In MLlib v1.0, a depth-1 tree had 1 leaf node, and a depth-2 tree had 1 root node and 2 leaf nodes.`
			`In MLlib v1.1, a depth-0 tree has 1 leaf node, and a depth-1 tree has 1 root node and 2 leaf nodes.`
			This depth is specified by the `maxDepth` parameter in
			[`Strategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.Strategy)
			or via [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree)
			static `trainClassifier` and `trainRegressor` methods.

			2. (Non-breaking change) We recommend using the newly added `trainClassifier` and `trainRegressor`
			methods to build a [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
			rather than using the old parameter class `Strategy`. These new training methods explicitly
			`separate classification and regression, and they replace specialized parameter types with`
			simple `String` types.

			Examples of the new, recommended `trainClassifier` and `trainRegressor` are given in the
			`[Decision Trees Guide](mllib-decision-tree.html#examples).`

			`## From 0.9 to 1.0`

			`In MLlib v1.0, we support both dense and sparse input in a unified way, which introduces a few`
			`breaking changes. If your data is sparse, please store it in a sparse format instead of dense to`
			`take advantage of sparsity in both storage and computation. Details are described below.`