spark-instrumented-optimizer/mllib
Marco Gaido c0c902aedc [SPARK-22119][FOLLOWUP][ML] Use spherical KMeans with cosine distance
## What changes were proposed in this pull request?

In #19340 some comments considered needed to use spherical KMeans when cosine distance measure is specified, as Matlab does; instead of the implementation based on the behavior of other tools/libraries like Rapidminer, nltk and ELKI, ie. the centroids are computed as the mean of all the points in the clusters.

The PR introduce the approach used in spherical KMeans. This behavior has the nice feature to minimize the within-cluster cosine distance.

## How was this patch tested?

existing/improved UTs

Author: Marco Gaido <marcogaido91@gmail.com>

Closes #20518 from mgaido91/SPARK-22119_followup.
2018-02-11 20:15:30 -06:00
..
src [SPARK-22119][FOLLOWUP][ML] Use spherical KMeans with cosine distance 2018-02-11 20:15:30 -06:00
pom.xml [SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT 2018-01-13 00:37:59 +08:00