spark-instrumented-optimizer/mllib
Xiangrui Meng 685ddcf525 [SPARK-5886][ML] Add StringIndexer as a feature transformer
This PR adds string indexer, which takes a column of string labels and outputs a double column with labels indexed by their frequency.

TODOs:
- [x] store feature to index map in output metadata

Author: Xiangrui Meng <meng@databricks.com>

Closes #4735 from mengxr/SPARK-5886 and squashes the following commits:

d82575f [Xiangrui Meng] fix test
700e70f [Xiangrui Meng] rename LabelIndexer to StringIndexer
16a6f8c [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5886
457166e [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5886
f8b30f4 [Xiangrui Meng] update label indexer to output metadata
e81ec28 [Xiangrui Meng] Merge branch 'openhashmap-contains' into SPARK-5886-2
d6e6f1f [Xiangrui Meng] add contains to primitivekeyopenhashmap
748a69b [Xiangrui Meng] add contains to OpenHashMap
def3c5c [Xiangrui Meng] add LabelIndexer
2015-04-12 22:41:05 -07:00
..
src [SPARK-5886][ML] Add StringIndexer as a feature transformer 2015-04-12 22:41:05 -07:00
pom.xml [SPARK-6341][mllib] Upgrade breeze from 0.11.1 to 0.11.2 2015-03-27 00:15:02 -07:00