[SPARK-35392][ML][PYTHON] Fix flaky tests in ml/clustering.py and ml/feature.py

### What changes were proposed in this pull request? This PR removes the check of `summary.logLikelihood` in ml/clustering.py - this GMM test is quite flaky. It fails easily e.g., if: - change number of partitions; - just change the way to compute the sum of weights; - change the underlying BLAS impl Also uses more permissive precision on `Word2Vec` test case. ### Why are the changes needed? To recover the build and tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing test cases. Closes #32533 from zhengruifeng/SPARK_35392_disable_flaky_gmm_test. Lead-authored-by: Ruifeng Zheng <ruifengz@foxmail.com> Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-05-13 22:23:51 +09:00 · 2021-05-13 22:23:51 +09:00 · f7704ece40
parent b6d57b6b99
commit f7704ece40
2 changed files with 3 additions and 5 deletions
--- a/python/pyspark/ml/clustering.py
+++ b/python/pyspark/ml/clustering.py
@ -273,8 +273,6 @@ class GaussianMixture(JavaEstimator, _GaussianMixtureParams, JavaMLWritable, Jav
    3
    >>> summary.clusterSizes
    [2, 2, 2]
-    >>> summary.logLikelihood
-    65.02945...
    >>> weights = model.weights
    >>> len(weights)
    3
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@ -4682,9 +4682,9 @@ class Word2Vec(JavaEstimator, _Word2VecParams, JavaMLReadable, JavaMLWritable):
    +----+--------------------+
    |word|              vector|
    +----+--------------------+
-    |   a|[0.09511678665876...|
-    |   b|[-1.2028766870498...|
-    |   c|[0.30153277516365...|
+    |   a|[0.0951...
+    |   b|[-1.202...
+    |   c|[0.3015...
    +----+--------------------+
    ...
    >>> model.findSynonymsArray("a", 2)