[SPARK-19759][ML] not using blas in ALSModel.predict for optimization

## What changes were proposed in this pull request? In `ALS.predict` currently we are using `blas.sdot` function to perform a dot product on two `Seq`s. It turns out that this is not the most efficient way. I used the following code to compare the implementations: ``` def time[R](block: => R): Unit = { val t0 = System.nanoTime() block val t1 = System.nanoTime() println("Elapsed time: " + (t1 - t0) + "ns") } val r = new scala.util.Random(100) val input = (1 to 500000).map(_ => (1 to 100).map(_ => r.nextFloat).toSeq) def f(a:Seq[Float], b:Seq[Float]): Float = { var r = 0.0f for(i <- 0 until a.length) { r+=a(i)*b(i) } r } import com.github.fommil.netlib.BLAS.{getInstance => blas} val b = (1 to 100).map(_ => r.nextFloat).toSeq time { input.foreach(a=>blas.sdot(100, a.toArray, 1, b.toArray, 1)) } // on average it takes 2968718815 ns time { input.foreach(a=>f(a,b)) } // on average it takes 515510185 ns ``` Thus this PR proposes the old-style for loop implementation for performance reasons. ## How was this patch tested? existing UTs Author: Marco Gaido <mgaido@hortonworks.com> Closes #19685 from mgaido91/SPARK-19759.
2017-11-11 04:10:54 -06:00 · 2017-11-11 04:10:54 -06:00 · 3eb315d714
parent 808e886b96
commit 3eb315d714
1 changed files with 7 additions and 3 deletions
--- a/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
@ -289,9 +289,13 @@ class ALSModel private[ml] (

  private val predict = udf { (featuresA: Seq[Float], featuresB: Seq[Float]) =>
    if (featuresA != null && featuresB != null) {
-      // TODO(SPARK-19759): try dot-producting on Seqs or another non-converted type for
-      // potential optimization.
-      blas.sdot(rank, featuresA.toArray, 1, featuresB.toArray, 1)
+      var dotProduct = 0.0f
+      var i = 0
+      while (i < rank) {
+        dotProduct += featuresA(i) * featuresB(i)
+        i += 1
+      }
+      dotProduct
    } else {
      Float.NaN
    }