spark-instrumented-optimizer/mllib-local
zhengruifeng 0ede08bcb2 [SPARK-31007][ML] KMeans optimization based on triangle-inequality
### What changes were proposed in this pull request?
apply Lemma 1 in [Using the Triangle Inequality to Accelerate K-Means](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf):

> Let x be a point, and let b and c be centers. If d(b,c)>=2d(x,b) then d(x,c) >= d(x,b);

It can be directly applied in EuclideanDistance, but not in CosineDistance.
However,  for CosineDistance we can luckily get a variant in the space of radian/angle.

### Why are the changes needed?
It help improving the performance of prediction and training (mostly)

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #27758 from zhengruifeng/km_triangle.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-24 11:24:15 -05:00
..
src [SPARK-31007][ML] KMeans optimization based on triangle-inequality 2020-04-24 11:24:15 -05:00
pom.xml [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00