[SPARK-20670][ML] Simplify FPGrowth transform

## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-20670 As suggested by Sean Owen in https://github.com/apache/spark/pull/17130, the transform code in FPGrowthModel can be simplified. As I tested on some public dataset http://fimi.ua.ac.be/data/, the performance of the new transform code is even or better than the old implementation. ## How was this patch tested? Existing unit test. Author: Yuhao Yang <yuhao.yang@intel.com> Closes #17912 from hhbyyh/fpgrowthTransform.
2017-05-09 23:39:26 -07:00 · 2017-05-09 23:39:26 -07:00 · a819dab668
parent a90c5cd822
commit a819dab668
1 changed files with 2 additions and 6 deletions
--- a/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala
@ -269,12 +269,8 @@ class FPGrowthModel private[ml] (
    val predictUDF = udf((items: Seq[_]) => {
      if (items != null) {
        val itemset = items.toSet
-        brRules.value.flatMap(rule =>
-          if (items != null && rule._1.forall(item => itemset.contains(item))) {
-            rule._2.filter(item => !itemset.contains(item))
-          } else {
-            Seq.empty
-          }).distinct
+        brRules.value.filter(_._1.forall(itemset.contains))
+          .flatMap(_._2.filter(!itemset.contains(_))).distinct
      } else {
        Seq.empty
      }}, dt)