[SPARK-9903] [MLLIB] skip local processing in PrefixSpan if there are no small prefixes

There exists a chance that the prefixes keep growing to the maximum pattern length. Then the final local processing step becomes unnecessary. feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8136 from mengxr/SPARK-9903.
This commit is contained in:
Xiangrui Meng 2015-08-12 20:44:40 -07:00
parent d2d5e7fe2d
commit d7053bea98

View file

@ -282,6 +282,11 @@ object PrefixSpan extends Logging {
largePrefixes = newLargePrefixes
}
var freqPatterns = sc.parallelize(localFreqPatterns, 1)
val numSmallPrefixes = smallPrefixes.size
logInfo(s"number of small prefixes for local processing: $numSmallPrefixes")
if (numSmallPrefixes > 0) {
// Switch to local processing.
val bcSmallPrefixes = sc.broadcast(smallPrefixes)
val distributedFreqPattern = postfixes.flatMap { postfix =>
@ -297,10 +302,10 @@ object PrefixSpan extends Logging {
(prefix.items ++ pattern, count)
}
}
// Union local frequent patterns and distributed ones.
val freqPatterns = (sc.parallelize(localFreqPatterns, 1) ++ distributedFreqPattern)
.persist(StorageLevel.MEMORY_AND_DISK)
freqPatterns = freqPatterns ++ distributedFreqPattern
}
freqPatterns
}