[SPARK-7663] [MLLIB] Add requirement for word2vec model
JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663). We should check the model size of word2vec, to prevent the unexpected empty. CC srowen. Author: Xusen Yin <yinxusen@gmail.com> Closes #6228 from yinxusen/SPARK-7663 and squashes the following commits: 21770c5 [Xusen Yin] check the vocab size 54ae63e [Xusen Yin] add requirement for word2vec model
This commit is contained in:
parent
60336e3bc0
commit
b3abf0b8d9
|
@ -158,6 +158,9 @@ class Word2Vec extends Serializable with Logging {
|
||||||
.sortWith((a, b) => a.cn > b.cn)
|
.sortWith((a, b) => a.cn > b.cn)
|
||||||
|
|
||||||
vocabSize = vocab.length
|
vocabSize = vocab.length
|
||||||
|
require(vocabSize > 0, "The vocabulary size should be > 0. You may need to check " +
|
||||||
|
"the setting of minCount, which could be large enough to remove all your words in sentences.")
|
||||||
|
|
||||||
var a = 0
|
var a = 0
|
||||||
while (a < vocabSize) {
|
while (a < vocabSize) {
|
||||||
vocabHash += vocab(a).word -> a
|
vocabHash += vocab(a).word -> a
|
||||||
|
|
Loading…
Reference in a new issue