[SPARK-20670][ML] Simplify FPGrowth transform
## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-20670 As suggested by Sean Owen in https://github.com/apache/spark/pull/17130, the transform code in FPGrowthModel can be simplified. As I tested on some public dataset http://fimi.ua.ac.be/data/, the performance of the new transform code is even or better than the old implementation. ## How was this patch tested? Existing unit test. Author: Yuhao Yang <yuhao.yang@intel.com> Closes #17912 from hhbyyh/fpgrowthTransform.
This commit is contained in:
parent
a90c5cd822
commit
a819dab668
|
@ -269,12 +269,8 @@ class FPGrowthModel private[ml] (
|
|||
val predictUDF = udf((items: Seq[_]) => {
|
||||
if (items != null) {
|
||||
val itemset = items.toSet
|
||||
brRules.value.flatMap(rule =>
|
||||
if (items != null && rule._1.forall(item => itemset.contains(item))) {
|
||||
rule._2.filter(item => !itemset.contains(item))
|
||||
} else {
|
||||
Seq.empty
|
||||
}).distinct
|
||||
brRules.value.filter(_._1.forall(itemset.contains))
|
||||
.flatMap(_._2.filter(!itemset.contains(_))).distinct
|
||||
} else {
|
||||
Seq.empty
|
||||
}}, dt)
|
||||
|
|
Loading…
Reference in a new issue