spark-instrumented-optimizer

History

Yuhao Yang 0cd84c86ca [SPARK-8703] [ML] Add CountVectorizer as a ml transformer to convert document to words count vector jira: https://issues.apache.org/jira/browse/SPARK-8703 Converts a text document to a sparse vector of token counts. I can further add an estimator to extract vocabulary from corpus if that's appropriate. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #7084 from hhbyyh/countVectorization and squashes the following commits: 5f3f655 [Yuhao Yang] text change 24728e4 [Yuhao Yang] style improvement 576728a [Yuhao Yang] rename to model and some fix 1deca28 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into countVectorization 99b0c14 [Yuhao Yang] undo extension from HashingTF 12c2dc8 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into countVectorization 7ee1c31 [Yuhao Yang] extends HashingTF 809fb59 [Yuhao Yang] minor fix for ut 7c61fb3 [Yuhao Yang] add countVectorizer	2015-07-09 10:26:38 -07:00
..
src	[SPARK-8703] [ML] Add CountVectorizer as a ml transformer to convert document to words count vector	2015-07-09 10:26:38 -07:00
pom.xml	[SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all	2015-06-27 23:27:52 -07:00

Yuhao Yang 0cd84c86ca [SPARK-8703] [ML] Add CountVectorizer as a ml transformer to convert document to words count vector

jira: https://issues.apache.org/jira/browse/SPARK-8703

Converts a text document to a sparse vector of token counts.

I can further add an estimator to extract vocabulary from corpus if that's appropriate.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #7084 from hhbyyh/countVectorization and squashes the following commits:

5f3f655 [Yuhao Yang] text change
24728e4 [Yuhao Yang] style improvement
576728a [Yuhao Yang] rename to model and some fix
1deca28 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into countVectorization
99b0c14 [Yuhao Yang] undo extension from HashingTF
12c2dc8 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into countVectorization
7ee1c31 [Yuhao Yang] extends HashingTF
809fb59 [Yuhao Yang] minor fix for ut
7c61fb3 [Yuhao Yang] add countVectorizer

2015-07-09 10:26:38 -07:00

src

[SPARK-8703] [ML] Add CountVectorizer as a ml transformer to convert document to words count vector

2015-07-09 10:26:38 -07:00

pom.xml

[SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all

2015-06-27 23:27:52 -07:00