## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) The generated document has the incorrect format for biseckmeans. ![bug](https://cloud.githubusercontent.com/assets/5033592/15233120/d910098a-185a-11e6-901d-44aeafc8a011.jpg) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Fix the formatting. ![fix](https://cloud.githubusercontent.com/assets/5033592/15233136/fce2ccd0-185a-11e6-9ded-14d71da4bdab.jpg) Author: wm624@hotmail.com <wm624@hotmail.com> Closes #13083 from wangmiao1981/doc.
4.4 KiB
layout | title | displayTitle |
---|---|---|
global | Clustering - spark.ml | Clustering - spark.ml |
In this section, we introduce the pipeline API for clustering in mllib.
Table of Contents
- This will become a table of contents (this text will be scraped). {:toc}
K-means
k-means is one of the most commonly used clustering algorithms that clusters the data points into a predefined number of clusters. The MLlib implementation includes a parallelized variant of the k-means++ method called kmeans||.
KMeans
is implemented as an Estimator
and generates a KMeansModel
as the base model.
Input Columns
Param name | Type(s) | Default | Description |
---|---|---|---|
featuresCol | Vector | "features" | Feature vector |
Output Columns
Param name | Type(s) | Default | Description |
---|---|---|---|
predictionCol | Int | "prediction" | Predicted cluster center |
Example
{% include_example scala/org/apache/spark/examples/ml/KMeansExample.scala %}
{% include_example java/org/apache/spark/examples/ml/JavaKMeansExample.java %}
{% include_example python/ml/kmeans_example.py %}
Latent Dirichlet allocation (LDA)
LDA
is implemented as an Estimator
that supports both EMLDAOptimizer
and OnlineLDAOptimizer
,
and generates a LDAModel
as the base models. Expert users may cast a LDAModel
generated by
EMLDAOptimizer
to a DistributedLDAModel
if needed.
Refer to the Scala API docs for more details.
{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}
Refer to the Java API docs for more details.
{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}
Refer to the Python API docs for more details.
{% include_example python/ml/lda_example.py %}
Bisecting k-means
Bisecting k-means is a kind of hierarchical clustering using a divisive (or "top-down") approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
Bisecting K-means can often be much faster than regular K-means, but it will generally produce a different clustering.
BisectingKMeans
is implemented as an Estimator
and generates a BisectingKMeansModel
as the base model.
Example
{% include_example scala/org/apache/spark/examples/ml/BisectingKMeansExample.scala %}
{% include_example java/org/apache/spark/examples/ml/JavaBisectingKMeansExample.java %}
{% include_example python/ml/bisecting_k_means_example.py %}