Commit graph

57 commits

Author SHA1 Message Date
GayathriMurali be88383e15 [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer
## What changes were proposed in this pull request?

Made changes to HashingTF,QuantileVectorizer and CountVectorizer

Author: GayathriMurali <gayathri.m@intel.com>

Closes #13745 from GayathriMurali/SPARK-15997.
2016-06-24 13:25:40 +02:00
Yuhao Yang a58f402394 [SPARK-16045][ML][DOC] Spark 2.0 ML.feature: doc update for stopwords and binarizer
## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-16045
2.0 Audit: Update document for StopWordsRemover and Binarizer.

## How was this patch tested?

manual review for doc

Author: Yuhao Yang <hhbyyh@gmail.com>
Author: Yuhao Yang <yuhao.yang@intel.com>

Closes #13375 from hhbyyh/stopdoc.
2016-06-21 00:47:36 -07:00
sethah 5e203505f1 [SPARK-15394][ML][DOCS] User guide typos and grammar audit
## What changes were proposed in this pull request?

Correct some typos and incorrectly worded sentences.

## How was this patch tested?

Doc changes only.

Note that many of these changes were identified by whomfire01

Author: sethah <seth.hendrickson16@gmail.com>

Closes #13180 from sethah/ml_guide_audit.
2016-05-19 23:29:37 -07:00
Yuhao Yang 3308a862ba [SPARK-15182][ML] Copy MLlib doc to ML: ml.feature.tf, idf
## What changes were proposed in this pull request?

We should now begin copying algorithm details from the spark.mllib guide to spark.ml as needed, rather than just linking back to the corresponding algorithms in the spark.mllib user guide.

## How was this patch tested?

manual review for doc.

Author: Yuhao Yang <hhbyyh@gmail.com>
Author: Yuhao Yang <yuhao.yang@intel.com>

Closes #12957 from hhbyyh/tfidfdoc.
2016-05-17 20:44:19 +02:00
Bryan Cutler 5d188a6970 [DOC][MINOR] Fixed minor errors in feature.ml user guide doc
## What changes were proposed in this pull request?
Fixed some minor errors found when reviewing feature.ml user guide

## How was this patch tested?
built docs locally

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #12940 from BryanCutler/feature.ml-doc_fixes-DOCS-MINOR.
2016-05-07 11:20:38 +02:00
Zheng RuiFeng 76ad04d9a0 [SPARK-14512] [DOC] Add python example for QuantileDiscretizer
## What changes were proposed in this pull request?
Add the missing python example for QuantileDiscretizer

## How was this patch tested?
manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #12281 from zhengruifeng/discret_pe.
2016-05-06 10:47:13 -07:00
Zheng RuiFeng e88476c8c6 [SPARK-14514][DOC] Add python example for VectorSlicer
## What changes were proposed in this pull request?
Add the missing python example for VectorSlicer

## How was this patch tested?
manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #12282 from zhengruifeng/vecslicer_pe.
2016-04-26 14:38:29 -07:00
Yuhao Yang ed9d803854 [SPARK-14635][ML] Documentation and Examples for TF-IDF only refer to HashingTF
## What changes were proposed in this pull request?

Currently, the docs for TF-IDF only refer to using HashingTF with IDF. However, CountVectorizer can also be used. We should probably amend the user guide and examples to show this.

## How was this patch tested?

unit tests and doc generation

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #12454 from hhbyyh/tfdoc.
2016-04-20 11:45:08 +01:00
Zheng RuiFeng 9bfb35da1e [SPARK-14515][DOC] Add python example for ChiSqSelector
## What changes were proposed in this pull request?
Add the missing python example for ChiSqSelector

## How was this patch tested?
manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #12283 from zhengruifeng/chi2_pe.
2016-04-18 17:14:22 -07:00
Zheng RuiFeng fcdd69260e [SPARK-14509][DOC] Add python CountVectorizerExample
## What changes were proposed in this pull request?
Add python CountVectorizerExample

## How was this patch tested?
manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #11917 from zhengruifeng/cv_pe.
2016-04-13 13:56:23 -07:00
Zheng RuiFeng adb9d73cd6 [SPARK-14339][DOC] Add python examples for DCT,MinMaxScaler,MaxAbsScaler
## What changes were proposed in this pull request?
add three python examples

## How was this patch tested?
manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #12063 from zhengruifeng/dct_pe.
2016-04-09 11:25:39 -07:00
Yuhao Yang 0b713e0455 [SPARK-13512][ML] add example and doc for MaxAbsScaler
## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-13512
Add example and doc for ml.feature.MaxAbsScaler.

## How was this patch tested?
 unit tests

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #11392 from hhbyyh/maxabsdoc.
2016-03-11 09:31:35 +02:00
Dongjoon Hyun 024482bf51 [MINOR][DOCS] Fix all typos in markdown files of doc and similar patterns in other comments
## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.
2016-02-22 09:52:07 +00:00
Yanbo Liang dd2325d9a7 [SPARK-11965][ML][DOC] Update user guide for RFormula feature interactions
Update user guide for RFormula feature interactions. Meanwhile we also update other new features such as supporting string label in Spark 1.6.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10222 from yanboliang/spark-11965.
2016-01-25 11:52:26 -08:00
BenFradet e25f1fe427 [MINOR][DOC] Fix broken word2vec link
Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10282 from BenFradet/SPARK-12199.
2015-12-14 13:50:30 +00:00
Xusen Yin 98b212d36b [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md
https://issues.apache.org/jira/browse/SPARK-12199

Follow-up PR of SPARK-11551. Fix some errors in ml-features.md

mengxr

Author: Xusen Yin <yinxusen@gmail.com>

Closes #10193 from yinxusen/SPARK-12199.
2015-12-12 17:47:01 -08:00
BenFradet aea676ca2d [SPARK-12217][ML] Document invalid handling for StringIndexer
Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation.

I wonder if I should also add a snippet to the code example, input welcome.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10257 from BenFradet/SPARK-12217.
2015-12-11 15:43:00 -08:00
Timothy Hunter 2ecbe02d5b [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation.
Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).

It also removes some files that I forgot to delete with #10207

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10234 from thunterdb/12212.
2015-12-10 12:50:46 -08:00
Xusen Yin 051c6a066f [SPARK-11551][DOC] Replace example code in ml-features.md using include_example
PR on behalf of somideshmukh, thanks!

Author: Xusen Yin <yinxusen@gmail.com>
Author: somideshmukh <somilde@us.ibm.com>

Closes #10219 from yinxusen/SPARK-11551.
2015-12-09 12:00:48 -08:00
Timothy Hunter 765c67f5f2 [SPARK-8517][ML][DOC] Reorganizes the spark.ml user guide
This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested.

<img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png">

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10207 from thunterdb/spark-8517.
2015-12-08 18:40:21 -08:00
BenFradet 06746b3005 [SPARK-12159][ML] Add user guide section for IndexToString transformer
Documentation regarding the `IndexToString` label transformer with code snippets in Scala/Java/Python.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10166 from BenFradet/SPARK-12159.
2015-12-08 12:45:34 -08:00
Cheng Lian da2012a0e1 [SPARK-11551][DOC][EXAMPLE] Revert PR #10002
This reverts PR #10002, commit 78209b0cca.

The original PR wasn't tested on Jenkins before being merged.

Author: Cheng Lian <lian@databricks.com>

Closes #10200 from liancheng/revert-pr-10002.
2015-12-08 19:18:59 +08:00
Yanbo Liang 4a39b5a1be [SPARK-11958][SPARK-11957][ML][DOC] SQLTransformer user guide and example code
Add ```SQLTransformer``` user guide, example code and make Scala API doc more clear.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10006 from yanboliang/spark-11958.
2015-12-07 23:50:57 -08:00
somideshmukh 78209b0cca [SPARK-11551][DOC][EXAMPLE] Replace example code in ml-features.md using include_example
Made new patch contaning only markdown examples moved to exmaple/folder.
Ony three  java code were not shfted since they were contaning compliation error ,these classes are
1)StandardScale 2)NormalizerExample 3)VectorIndexer

Author: Xusen Yin <yinxusen@gmail.com>
Author: somideshmukh <somilde@us.ibm.com>

Closes #10002 from somideshmukh/SomilBranch1.33.
2015-12-07 23:26:34 -08:00
Xusen Yin 871e85d9c1 [SPARK-11963][DOC] Add docs for QuantileDiscretizer
https://issues.apache.org/jira/browse/SPARK-11963

Author: Xusen Yin <yinxusen@gmail.com>

Closes #9962 from yinxusen/SPARK-11963.
2015-12-07 13:16:47 -08:00
Jeff Zhang 7470d9edbb [DOCUMENTATION][MLLIB] typo in mllib doc
\cc mengxr

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10093 from zjffdu/mllib_typo.
2015-12-03 15:36:28 +00:00
Xusen Yin e76431f886 [SPARK-11961][DOC] Add docs of ChiSqSelector
https://issues.apache.org/jira/browse/SPARK-11961

Author: Xusen Yin <yinxusen@gmail.com>

Closes #9965 from yinxusen/SPARK-11961.
2015-12-01 15:21:53 -08:00
Yanbo Liang 99693fef0a [SPARK-11723][ML][DOC] Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame
Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame, include:
* Use libSVM data source for all example codes under examples/ml, and remove unused import.
* Use libSVM data source for user guides under ml-*** which were omitted by #8697.
* Fix bug: We should use ```sqlContext.read().format("libsvm").load(path)``` at Java side, but the API doc and user guides misuse as ```sqlContext.read.format("libsvm").load(path)```.
* Code cleanup.

mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9690 from yanboliang/spark-11723.
2015-11-13 08:43:05 -08:00
Xusen Yin 943d4fa204 [SPARK-11289][DOC] Substitute code examples in ML features extractors with include_example
mengxr https://issues.apache.org/jira/browse/SPARK-11289

I make some changes in ML feature extractors. I.e. TF-IDF, Word2Vec, and CountVectorizer. I add new example code in spark/examples, hope it is the right place to add those examples.

Author: Xusen Yin <yinxusen@gmail.com>

Closes #9266 from yinxusen/SPARK-11289.
2015-10-26 21:17:53 -07:00
Yuhao Yang 9b9fe5f7bf [SPARK-10670] [ML] [Doc] add api reference for ml doc
jira: https://issues.apache.org/jira/browse/SPARK-10670
In the Markdown docs for the spark.ml Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "Word2Vec" section in 64743870f2/docs/ml-features.md
This JIRA is just for spark.ml, not spark.mllib

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #8901 from hhbyyh/docAPI.
2015-09-28 22:40:02 -07:00
Joseph K. Bradley b921fe4dc0 [SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups
Various ML guide cleanups.

* ml-guide.md: Make it easier to access the algorithm-specific guides.
* LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically.  E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics.
* mllib-feature-extraction.html#elementwiseproduct: “w” parameter should be “scalingVec”
* Clean up Binarizer user guide a little.
* Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place.
* spark.ml Word2Vec user guide: clean up grammar/writing
* Chi Sq Feature Selector docs: Improve text in doc.

CC: mengxr feynmanliang

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8752 from jkbradley/mlguide-fixes-1.5.
2015-09-15 19:43:26 -07:00
y-shimizu c268ca4ddd [SPARK-10518] [DOCS] Update code examples in spark.ml user guide to use LIBSVM data source instead of MLUtils
I fixed to use LIBSVM data source in the example code in spark.ml instead of MLUtils

Author: y-shimizu <y.shimizu0429@gmail.com>

Closes #8697 from y-shimizu/SPARK-10518.
2015-09-11 08:27:30 -07:00
Yuhao Yang 91a577d277 [SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover User Guide
jira: https://issues.apache.org/jira/browse/SPARK-10249

update user guide since python support added.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #8620 from hhbyyh/swPyDocExample.
2015-09-08 22:33:23 -07:00
Yuhao Yang e2a843090c [SPARK-9890] [DOC] [ML] User guide for CountVectorizer
jira: https://issues.apache.org/jira/browse/SPARK-9890

document with Scala and java examples

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #8487 from hhbyyh/cvDoc.
2015-08-28 08:00:44 -07:00
Feynman Liang 5bfe9e1111 [SPARK-9680] [MLLIB] [DOC] StopWordsRemovers user guide and Java compatibility test
* Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
* Cleans up scaladocs for public methods
* Adds test for Java compatibility
* Follow up Python user guide code example is tracked by SPARK-10249

Author: Feynman Liang <fliang@databricks.com>

Closes #8436 from feynmanliang/SPARK-10230.
2015-08-27 16:10:37 -07:00
Yuhao Yang b37f0cc1b4 [SPARK-8531] [ML] Update ML user guide for MinMaxScaler
jira: https://issues.apache.org/jira/browse/SPARK-8531

Update ML user guide for MinMaxScaler

Author: Yuhao Yang <hhbyyh@gmail.com>
Author: unknown <yuhaoyan@yuhaoyan-MOBL1.ccr.corp.intel.com>

Closes #7211 from hhbyyh/minmaxdoc.
2015-08-25 10:54:03 -07:00
Xusen Yin 630a994e6a [SPARK-9893] User guide with Java test suite for VectorSlicer
Add user guide for `VectorSlicer`, with Java test suite and Python version VectorSlicer.

Note that Python version does not support selecting by names now.

Author: Xusen Yin <yinxusen@gmail.com>

Closes #8267 from yinxusen/SPARK-9893.
2015-08-21 16:30:12 -07:00
Eric Liang 8e0a072f78 [SPARK-9895] User Guide for RFormula Feature Transformer
mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #8293 from ericl/docs-2.
2015-08-19 15:43:08 -07:00
Joseph K. Bradley 39e4ebd521 [SPARK-10060] [ML] [DOC] spark.ml DecisionTree user guide
New user guide section ml-decision-tree.md, including code examples.

I have run all examples, including the Java ones.

CC: manishamde yanboliang mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8244 from jkbradley/ml-dt-docs.
2015-08-19 07:38:27 -07:00
lewuathe ba2a07e2b6 [SPARK-9977] [DOCS] Update documentation for StringIndexer
By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label.
I think it is better to make it explicit on documentation.

Author: lewuathe <lewuathe@me.com>

Closes #8205 from Lewuathe/SPARK-9977.
2015-08-19 09:54:03 +01:00
Sean Owen f141efeafb [SPARK-10070] [DOCS] Remove Guava dependencies in user guides
`Lists.newArrayList` -> `Arrays.asList`

CC jkbradley feynmanliang

Anybody into replacing usages of `Lists.newArrayList` in the examples / source code too? this method isn't useful in Java 7 and beyond.

Author: Sean Owen <sowen@cloudera.com>

Closes #8272 from srowen/SPARK-10070.
2015-08-19 09:41:09 +01:00
Feynman Liang badf7fa650 [SPARK-8473] [SPARK-9889] [ML] User guide and example code for DCT
mengxr jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8184 from feynmanliang/SPARK-9889-DCT-docs.
2015-08-18 17:54:49 -07:00
Yanbo Liang 0076e82123 [SPARK-9768] [PYSPARK] [ML] Add Python API and user guide for ml.feature.ElementwiseProduct
Add Python API, user guide and example for ml.feature.ElementwiseProduct.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8061 from yanboliang/SPARK-9768.
2015-08-17 17:25:41 -07:00
Yuhao Yang 66d87c1d76 [SPARK-7583] [MLLIB] User guide update for RegexTokenizer
jira: https://issues.apache.org/jira/browse/SPARK-7583

User guide update for RegexTokenizer

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #7828 from hhbyyh/regexTokenizerDoc.
2015-08-12 09:35:32 -07:00
Yanbo Liang 8ca287ebbd [SPARK-9191] [ML] [Doc] Add ml.PCA user guide and code examples
Add ml.PCA user guide document and code examples for Scala/Java/Python.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #7522 from yanboliang/ml-pca-md and squashes the following commits:

60dec05 [Yanbo Liang] address comments
f992abe [Yanbo Liang] Add ml.PCA doc and examples
2015-08-03 13:58:00 -07:00
Feynman Liang c5532e2fe7 [SPARK-8457] [ML] NGram Documentation
Add documentation for NGram feature transformer.

Author: Feynman Liang <fliang@databricks.com>

Closes #7244 from feynmanliang/SPARK-8457 and squashes the following commits:

5aface9 [Feynman Liang] Pretty print Scala output and add API doc to each codetab
60d5ac0 [Feynman Liang] Inline API doc and fix indentation
736ccbc [Feynman Liang] NGram feature transformer documentation
2015-07-08 14:49:52 -07:00
Xiangrui Meng 0221c7f0ef [SPARK-7582] [MLLIB] user guide for StringIndexer
This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6561 from mengxr/SPARK-7582 and squashes the following commits:

4bba4f1 [Xiangrui Meng] fix example
ba1cd1b [Xiangrui Meng] fix style
7fa18d1 [Xiangrui Meng] add user guide for StringIndexer
136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer
2015-06-01 22:03:29 -07:00
Xiangrui Meng 90c606925e [SPARK-7584] [MLLIB] User guide for VectorAssembler
This PR adds a section in the user guide for `VectorAssembler` with code examples in Python/Java/Scala. It also adds a unit test in Java.

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6556 from mengxr/SPARK-7584 and squashes the following commits:

11313f6 [Xiangrui Meng] simplify Java example
0cd47f3 [Xiangrui Meng] update user guide
fd36292 [Xiangrui Meng] update Java unit test
ce61ca0 [Xiangrui Meng] add Java unit test for VectorAssembler
e399942 [Xiangrui Meng] scala/python example code
2015-06-01 15:05:14 -07:00
Octavian Geagla da2112aef2 [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct
Author: Octavian Geagla <ogeagla@gmail.com>

Closes #6501 from ogeagla/ml-guide-elemwiseprod and squashes the following commits:

4ad93d5 [Octavian Geagla] [SPARK-7576] [MLLIB] Incorporate code review feedback.
f7be7ad [Octavian Geagla] [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct.
2015-05-29 23:55:19 -07:00
Xusen Yin 1bd63e82fd [SPARK-7577] [ML] [DOC] add bucketizer doc
CC jkbradley

Author: Xusen Yin <yinxusen@gmail.com>

Closes #6451 from yinxusen/SPARK-7577 and squashes the following commits:

e2dc32e [Xusen Yin] rename colums
e350e49 [Xusen Yin] add all demos
006ddf1 [Xusen Yin] add java test
3238481 [Xusen Yin] add bucketizer
2015-05-28 17:30:12 -07:00