spark-instrumented-optimizer/docs/ml-clustering.md

---
layout: global
title: Clustering - spark.ml
displayTitle: Clustering - spark.ml
---

In this section, we introduce the pipeline API for [clustering in mllib](mllib-clustering.html).

**Table of Contents**

* This will become a table of contents (this text will be scraped).
{:toc}

## Latent Dirichlet allocation (LDA)

`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and `OnlineLDAOptimizer`,
and generates a `LDAModel` as the base models. Expert users may cast a `LDAModel` generated by
`EMLDAOptimizer` to a `DistributedLDAModel` if needed.

<div class="codetabs">

<div data-lang="scala" markdown="1">

Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details.

{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}
</div>

<div data-lang="java" markdown="1">

Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) for more details.

{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}
</div>

</div>
[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Original PR is reverted due to document build error. https://github.com/apache/spark/pull/9722 mengxr feynmanliang yinxusen Sorry for the troubling. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9974 from hhbyyh/ldaMLExample. 2015-11-30 17:56:51 -05:00			`---`
			`layout: global`
[SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation. Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark). It also removes some files that I forgot to delete with #10207 Author: Timothy Hunter <timhunter@databricks.com> Closes #10234 from thunterdb/12212. 2015-12-10 15:50:46 -05:00			`title: Clustering - spark.ml`
			`displayTitle: Clustering - spark.ml`
[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Original PR is reverted due to document build error. https://github.com/apache/spark/pull/9722 mengxr feynmanliang yinxusen Sorry for the troubling. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9974 from hhbyyh/ldaMLExample. 2015-11-30 17:56:51 -05:00			`---`

			`In this section, we introduce the pipeline API for [clustering in mllib](mllib-clustering.html).`

[SPARK-8517][ML][DOC] Reorganizes the spark.ml user guide This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested. <img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png"> Author: Timothy Hunter <timhunter@databricks.com> Closes #10207 from thunterdb/spark-8517. 2015-12-08 21:40:21 -05:00			`Table of Contents`

			`* This will become a table of contents (this text will be scraped).`
			`{:toc}`

[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Original PR is reverted due to document build error. https://github.com/apache/spark/pull/9722 mengxr feynmanliang yinxusen Sorry for the troubling. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9974 from hhbyyh/ldaMLExample. 2015-11-30 17:56:51 -05:00			`## Latent Dirichlet allocation (LDA)`

			`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and `OnlineLDAOptimizer`,
			and generates a `LDAModel` as the base models. Expert users may cast a `LDAModel` generated by
			`EMLDAOptimizer` to a `DistributedLDAModel` if needed.

			`<div class="codetabs">`

			`<div data-lang="scala" markdown="1">`

			`Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details.`

			`{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}`
			`</div>`

			`<div data-lang="java" markdown="1">`

			`Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) for more details.`

			`{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}`
			`</div>`

			`</div>`