[SPARK-22332][ML][TEST] Fix NaiveBayes unit test occasionly fail (cause by test dataset not deterministic)

## What changes were proposed in this pull request? Fix NaiveBayes unit test occasionly fail: Set seed for `BrzMultinomial.sample`, make `generateNaiveBayesInput` output deterministic dataset. (If we do not set seed, the generated dataset will be random, and the model will be possible to exceed the tolerance in the test, which trigger this failure) ## How was this patch tested? Manually run tests multiple times and check each time output models contains the same values. Author: WeichenXu <weichen.xu@databricks.com> Closes #19558 from WeichenXu123/fix_nb_test_seed.
2017-10-25 14:31:36 -07:00 · 2017-10-25 14:31:36 -07:00 · 841f1d776f
parent b377ef133c
commit 841f1d776f
1 changed files with 2 additions and 1 deletions
--- a/mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala
@ -20,7 +20,7 @@ package org.apache.spark.ml.classification
 import scala.util.Random

 import breeze.linalg.{DenseVector => BDV, Vector => BV}
-import breeze.stats.distributions.{Multinomial => BrzMultinomial}
+import breeze.stats.distributions.{Multinomial => BrzMultinomial, RandBasis => BrzRandBasis}

 import org.apache.spark.{SparkException, SparkFunSuite}
 import org.apache.spark.ml.classification.NaiveBayes.{Bernoulli, Multinomial}
@ -335,6 +335,7 @@ object NaiveBayesSuite {
    val _pi = pi.map(math.exp)
    val _theta = theta.map(row => row.map(math.exp))

+    implicit val rngForBrzMultinomial = BrzRandBasis.withSeed(seed)
    for (i <- 0 until nPoints) yield {
      val y = calcLabel(rnd.nextDouble(), _pi)
      val xi = modelType match {