[SPARK-7663] [MLLIB] Add requirement for word2vec model

JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663).

We should check the model size of word2vec, to prevent the unexpected empty.

CC srowen.

Author: Xusen Yin <yinxusen@gmail.com>

Closes #6228 from yinxusen/SPARK-7663 and squashes the following commits:

21770c5 [Xusen Yin] check the vocab size
54ae63e [Xusen Yin] add requirement for word2vec model
This commit is contained in:
Xusen Yin 2015-05-20 10:41:18 +01:00 committed by Sean Owen
parent 60336e3bc0
commit b3abf0b8d9

View file

@ -158,6 +158,9 @@ class Word2Vec extends Serializable with Logging {
.sortWith((a, b) => a.cn > b.cn) .sortWith((a, b) => a.cn > b.cn)
vocabSize = vocab.length vocabSize = vocab.length
require(vocabSize > 0, "The vocabulary size should be > 0. You may need to check " +
"the setting of minCount, which could be large enough to remove all your words in sentences.")
var a = 0 var a = 0
while (a < vocabSize) { while (a < vocabSize) {
vocabHash += vocab(a).word -> a vocabHash += vocab(a).word -> a