Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#6127 from jkbradley/feature-guide-2 and squashes the following commits:
cd47f4b [Joseph K. Bradley] Updated based on code review
f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3
0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests
a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF
(cherry picked from commit 2728c3df66)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#6255 from jkbradley/vector-indexer-guide and squashes the following commits:
dbb8c4c [Joseph K. Bradley] simplified VectorIndexerModel.javaCategoryMaps
f692084 [Joseph K. Bradley] Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it.
(cherry picked from commit 6d75ed7e5c)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Author: Sandy Ryza <sandy@cloudera.com>
Closes#6126 from sryza/sandy-spark-7579 and squashes the following commits:
5af803d [Sandy Ryza] SPARK-7579 [MLLIB] User guide update for OneHotEncoder
(cherry picked from commit 829f1d95ba)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
CC jkbradley.
JIRA [issue](https://issues.apache.org/jira/browse/SPARK-7586).
Author: Xusen Yin <yinxusen@gmail.com>
Closes#6181 from yinxusen/SPARK-7586 and squashes the following commits:
77014c5 [Xusen Yin] comment fix
57a4c07 [Xusen Yin] small fix for docs
1178c8f [Xusen Yin] remove the correctness check in java suite
1c3f389 [Xusen Yin] delete sbt commit
1af152b [Xusen Yin] check python example code
1b5369e [Xusen Yin] add docs of word2vec
(cherry picked from commit 68fb2a46ed)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
JIRA [here](https://issues.apache.org/jira/browse/SPARK-7581).
CC jkbradley
Author: Xusen Yin <yinxusen@gmail.com>
Closes#6113 from yinxusen/SPARK-7581 and squashes the following commits:
1a7d80d [Xusen Yin] merge with master
892a8e9 [Xusen Yin] fix python 3 compatibility
ec935bf [Xusen Yin] small fix
3e9fa1d [Xusen Yin] delete note
69fcf85 [Xusen Yin] simplify and add python example
81d21dc [Xusen Yin] add programming guide for Polynomial Expansion
40babfb [Xusen Yin] add java test suite for PolynomialExpansion
(cherry picked from commit 6008ec14ed)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-7556
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#6116 from viirya/binarizer_doc and squashes the following commits:
40cb677 [Liang-Chi Hsieh] Better print out.
5b7ef1d [Liang-Chi Hsieh] Make examples more clear.
1bf9c09 [Liang-Chi Hsieh] For comments.
6cf8cba [Liang-Chi Hsieh] Add user guide for Binarizer.
(cherry picked from commit c8696337e2)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Added feature transformer subsection to spark.ml guide, with HashingTF and Tokenizer. Added JavaHashingTFSuite to test Java examples in new guide.
I've run Scala, Python examples in the Spark/PySpark shells. I ran the Java examples via the test suite (with small modifications for printing).
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#6093 from jkbradley/hashingtf-guide and squashes the following commits:
d5d213f [Joseph K. Bradley] small fix
dd6e91a [Joseph K. Bradley] fixes from code review of user guide
33c3ff9 [Joseph K. Bradley] small fix
bc6058c [Joseph K. Bradley] fix link
361a174 [Joseph K. Bradley] Added subsection for feature transformers to spark.ml guide, with HashingTF and Tokenizer. Added JavaHashingTFSuite to test Java examples in new guide
(cherry picked from commit f0c1bc3472)
Signed-off-by: Xiangrui Meng <meng@databricks.com>