spark-instrumented-optimizer/mllib
Yuhao Yang 61f9c8711c [SPARK-11069][ML] Add RegexTokenizer option to convert to lowercase
jira: https://issues.apache.org/jira/browse/SPARK-11069
quotes from jira:
Tokenizer converts strings to lowercase automatically, but RegexTokenizer does not. It would be nice to add an option to RegexTokenizer to convert to lowercase. Proposal:
call the Boolean Param "toLowercase"
set default to false (so behavior does not change)

Actually sklearn converts to lowercase before tokenizing too

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #9092 from hhbyyh/tokenLower.
2015-11-09 16:55:23 -08:00
..
src [SPARK-11069][ML] Add RegexTokenizer option to convert to lowercase 2015-11-09 16:55:23 -08:00
pom.xml [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. 2015-10-07 14:11:21 -07:00