[SPARK-20670][ML] Simplify FPGrowth transform

## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-20670
As suggested by Sean Owen in https://github.com/apache/spark/pull/17130, the transform code in FPGrowthModel can be simplified.

As I tested on some public dataset http://fimi.ua.ac.be/data/, the performance of the new transform code is even or better than the old implementation.

## How was this patch tested?

Existing unit test.

Author: Yuhao Yang <yuhao.yang@intel.com>

Closes #17912 from hhbyyh/fpgrowthTransform.
This commit is contained in:
Yuhao Yang 2017-05-09 23:39:26 -07:00 committed by Felix Cheung
parent a90c5cd822
commit a819dab668

View file

@ -269,12 +269,8 @@ class FPGrowthModel private[ml] (
val predictUDF = udf((items: Seq[_]) => {
if (items != null) {
val itemset = items.toSet
brRules.value.flatMap(rule =>
if (items != null && rule._1.forall(item => itemset.contains(item))) {
rule._2.filter(item => !itemset.contains(item))
} else {
Seq.empty
}).distinct
brRules.value.filter(_._1.forall(itemset.contains))
.flatMap(_._2.filter(!itemset.contains(_))).distinct
} else {
Seq.empty
}}, dt)