7f3c8fa42e
### What changes were proposed in this pull request? 1, compute summary and update distributions in one pass; 2, remove logic related to check `shouldDistributeGaussians` ### Why are the changes needed? In current impl, GMM need to trigger two jobs at one iteration: 1, one to compute summary; 2, if `shouldDistributeGaussians = ((k - 1.0) / k) * numFeatures > 25.0`, trigger another to update distributions; `shouldDistributeGaussians` is almost true in practice, since numFeatures is likely to be greater than 25. We can use only one job to impl above computation, by following the logic in `KMeans`: using `reduceByKey` to compute statistics for each center ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing testsuites Closes #27784 from zhengruifeng/gmm_avoid_distri_gaussian. Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by: zhengruifeng <ruifengz@foxmail.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |