spark-instrumented-optimizer/mllib-local/src
zhengruifeng 0ede08bcb2 [SPARK-31007][ML] KMeans optimization based on triangle-inequality
### What changes were proposed in this pull request?
apply Lemma 1 in [Using the Triangle Inequality to Accelerate K-Means](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf):

> Let x be a point, and let b and c be centers. If d(b,c)>=2d(x,b) then d(x,c) >= d(x,b);

It can be directly applied in EuclideanDistance, but not in CosineDistance.
However,  for CosineDistance we can luckily get a variant in the space of radian/angle.

### Why are the changes needed?
It help improving the performance of prediction and training (mostly)

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #27758 from zhengruifeng/km_triangle.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-24 11:24:15 -05:00
..
main/scala/org/apache/spark/ml [SPARK-31007][ML] KMeans optimization based on triangle-inequality 2020-04-24 11:24:15 -05:00
test/scala/org/apache/spark/ml [SPARK-30773][ML] Support NativeBlas for level-1 routines 2020-03-20 10:32:58 -05:00