dd78167585
## What changes were proposed in this pull request? This PR adds the ClusteringEvaluator Evaluator which contains two metrics: - **cosineSilhouette**: the Silhouette measure using the cosine distance; - **squaredSilhouette**: the Silhouette measure using the squared Euclidean distance. The implementation of the two metrics refers to the algorithm proposed and explained [here](https://drive.google.com/file/d/0B0Hyo%5f%5fbG%5f3fdkNvSVNYX2E3ZU0/view). These algorithms have been thought for a distributed and parallel environment, thus they have reasonable performance, unlike a naive Silhouette implementation following its definition. ## How was this patch tested? The patch has been tested with the additional unit tests added (comparing the results with the ones provided by [Python sklearn library](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html)). Author: Marco Gaido <mgaido@hortonworks.com> Closes #18538 from mgaido91/SPARK-14516. |
||
---|---|---|
.. | ||
src | ||
pom.xml |