Commit graph

22 commits

Author SHA1 Message Date
actuaryzhang d743ea4c76 [MINOR][DOC] Update GLM doc to include tweedie distribution
Update GLM documentation to include the Tweedie distribution. #16344

jkbradley yanboliang

Author: actuaryzhang <actuaryzhang10@gmail.com>

Closes #17103 from actuaryzhang/doc.
2017-02-28 14:43:44 -08:00
Yuhao Yang 280afe0ef3 [SPARK-19337][ML][DOC] Documentation and examples for LinearSVC
## What changes were proposed in this pull request?

Documentation and examples (Java, scala, python, R) for LinearSVC

## How was this patch tested?
local doc generation

Author: Yuhao Yang <yuhao.yang@intel.com>

Closes #16968 from hhbyyh/mlsvmdoc.
2017-02-21 09:38:14 -08:00
Yanbo Liang 9bf8f3cd4f [SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide
## What changes were proposed in this pull request?
* Add all R examples for ML wrappers which were added during 2.1 release cycle.
* Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them.
* Add corresponding examples to ML user guide.
* Update ML section of SparkR user guide.

Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```.

## How was this patch tested?
Run all examples manually.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16148 from yanboliang/spark-18325.
2016-12-08 06:19:38 -08:00
wm624@hotmail.com aad11209eb [SPARK-18633][ML][EXAMPLE] Add multiclass logistic regression summary python example and document
## What changes were proposed in this pull request?
Logistic Regression summary is added in Python API. We need to add example and document for summary.

The newly added example is consistent with Scala and Java examples.

## How was this patch tested?

Manually tests: Run the example with spark-submit; copy & paste code into pyspark; build document and check the document.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16064 from wangmiao1981/py.
2016-12-07 18:12:49 -08:00
Yanbo Liang eb8dd68132 [SPARK-18279][DOC][ML][SPARKR] Add R examples to ML programming guide.
## What changes were proposed in this pull request?
Add R examples to ML programming guide for the following algorithms as POC:
* spark.glm
* spark.survreg
* spark.naiveBayes
* spark.kmeans

The four algorithms were added to SparkR since 2.0.0, more docs for algorithms added during 2.1 release cycle will be addressed in a separate follow-up PR.

## How was this patch tested?
This is the screenshots of generated ML programming guide for ```GeneralizedLinearRegression```:
![image](https://cloud.githubusercontent.com/assets/1962026/20866403/babad856-b9e1-11e6-9984-62747801e8c4.png)

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16136 from yanboliang/spark-18279.
2016-12-05 00:39:44 -08:00
Zheng RuiFeng cdaf4ce9fe
[SPARK-18480][DOCS] Fix wrong links for ML guide docs
## What changes were proposed in this pull request?
1, There are two `[Graph.partitionBy]` in `graphx-programming-guide.md`, the first one had no effert.
2, `DataFrame`, `Transformer`, `Pipeline` and `Parameter`  in `ml-pipeline.md` were linked to `ml-guide.html` by mistake.
3, `PythonMLLibAPI` in `mllib-linear-methods.md` was not accessable, because class `PythonMLLibAPI` is private.
4, Other link updates.
## How was this patch tested?
 manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #15912 from zhengruifeng/md_fix.
2016-11-17 13:40:16 +00:00
Zheng RuiFeng a75e3fe923
[SPARK-18446][ML][DOCS] Add links to API docs for ML algos
## What changes were proposed in this pull request?
Add links to API docs for ML algos
## How was this patch tested?
Manual checking for the API links

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #15890 from zhengruifeng/algo_link.
2016-11-16 10:53:23 +00:00
Zheng RuiFeng b1033fb745
[MINOR][DOC] Unify example marks
## What changes were proposed in this pull request?
1, `**Example**` => `**Examples**`, because more algos use `**Examples**`.
2,  delete `### Examples` in `Isotonic regression`, because it's not that special in http://spark.apache.org/docs/latest/ml-classification-regression.html
3, add missing marks for `LDA` and other algos.

## How was this patch tested?
No tests for it only modify doc

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #15783 from zhengruifeng/doc_fix.
2016-11-08 14:04:07 +00:00
sethah 9df54f5325
[SPARK-17239][ML][DOC] Update user guide for multiclass logistic regression
## What changes were proposed in this pull request?
Updates user guide to reflect that LogisticRegression now supports multiclass. Also adds new examples to show multiclass training.

## How was this patch tested?
Ran locally using spark-submit, run-example, and copy/paste from user guide into shells. Generated docs and verified correct output.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #15349 from sethah/SPARK-17239.
2016-10-05 18:28:21 +00:00
Joseph K. Bradley 5ffd5d3838 [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide
## What changes were proposed in this pull request?

Made DataFrame-based API primary
* Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html
* mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html
* ml-guide.html includes a "maintenance mode" announcement about the RDD-based API
  * **Reviewers: please check this carefully**
* (minor) Titles for DF API no longer include "- spark.ml" suffix.  Titles for RDD API have "- RDD-based API" suffix
* Moved migration guide to ml-guide from mllib-guide
  * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides
  * **Reviewers**: I did not change any of the content of the migration guides.

Reorganized DataFrame-based guide:
* ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc.
* Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html
  * **Reviewers**: I did not change the content of these guides, except some intro text.
* Sidebar remains the same, but with pipeline and tuning sections added

Other:
* ml-classification-regression.html: Moved text about linear methods to new section in page

## How was this patch tested?

Generated docs locally

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #14213 from jkbradley/ml-guide-2.0.
2016-07-15 13:38:23 -07:00
WeichenXu 9040d83bc2 [SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic regression
## What changes were proposed in this pull request?

add ml doc for ml isotonic regression
add scala example for ml isotonic regression
add java example for ml isotonic regression
add python example for ml isotonic regression

modify scala example for mllib isotonic regression
modify java example for mllib isotonic regression
modify python example for mllib isotonic regression

add data/mllib/sample_isotonic_regression_libsvm_data.txt
delete data/mllib/sample_isotonic_regression_data.txt
## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #13381 from WeichenXu123/add_isotonic_regression_doc.
2016-06-16 17:35:40 -07:00
Dongjoon Hyun ad102af169 [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents
## What changes were proposed in this pull request?

This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change.

**Fix broken links**
  * mllib-data-types.md
  * mllib-decision-tree.md
  * mllib-ensembles.md
  * mllib-feature-extraction.md
  * mllib-pmml-model-export.md
  * mllib-statistics.md

**Fix malformed section header and scala coding style**
  * mllib-linear-methods.md

**Replace indirect forward links with direct one**
  * ml-classification-regression.md

## How was this patch tested?

Manual tests (with `cd docs; jekyll build`.)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #13608 from dongjoon-hyun/SPARK-15883.
2016-06-11 12:55:38 +01:00
Yanbo Liang 6ecedf39b4 [SPARK-13590][ML][DOC] Document spark.ml LiR, LoR and AFTSurvivalRegression behavior difference
## What changes were proposed in this pull request?
When fitting ```LinearRegressionModel```(by "l-bfgs" solver) and ```LogisticRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce same model as R glmnet but different from LIBSVM.

When fitting ```AFTSurvivalRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce different model compared with R survival::survreg.

We should output a warning message and clarify in document for this condition.

## How was this patch tested?
Document change, no unit test.

cc mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12731 from yanboliang/spark-13590.
2016-06-07 15:25:36 -07:00
sethah c96244f5ac [SPARK-15186][ML][DOCS] Add user guide for generalized linear regression
## What changes were proposed in this pull request?

This patch adds a user guide section for generalized linear regression and includes the examples from [#12754](https://github.com/apache/spark/pull/12754).

## How was this patch tested?

Documentation only, no tests required.

## Approach

In general, it is a bit unclear what level of detail ought to be included in the user guide since there is a lot of variability within the current user guide. I tried to give a fairly brief mathematical introduction to GLMs, and cover what types of problems they could be used for. Additionally, I included a brief blurb on the IRLS solver. The input/output columns are given in a table as is found elsewhere in the docs (though, again, these appear rather intermittently in the current docs), as well as a table providing the supported families and their link functions.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #13139 from sethah/SPARK-15186.
2016-05-27 12:55:48 -07:00
sethah 5e203505f1 [SPARK-15394][ML][DOCS] User guide typos and grammar audit
## What changes were proposed in this pull request?

Correct some typos and incorrectly worded sentences.

## How was this patch tested?

Doc changes only.

Note that many of these changes were identified by whomfire01

Author: sethah <seth.hendrickson16@gmail.com>

Closes #13180 from sethah/ml_guide_audit.
2016-05-19 23:29:37 -07:00
Zheng RuiFeng ad1a8466e9 [SPARK-15141][EXAMPLE][DOC] Update OneVsRest Examples
## What changes were proposed in this pull request?
1, Add python example for OneVsRest
2, remove args-parsing

## How was this patch tested?
manual tests
`./bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py`

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #12920 from zhengruifeng/ovr_pe.
2016-05-11 09:53:36 +02:00
Yuhao Yang 781df49983 [SPARK-13089][ML] [Doc] spark.ml Naive Bayes user guide and examples
jira: https://issues.apache.org/jira/browse/SPARK-13089

Add section in ml-classification.md for NaiveBayes DataFrame-based API, plus example code (using include_example to clip code from examples/ folder files).

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #11015 from hhbyyh/naiveBayesDoc.
2016-04-13 13:58:35 -07:00
Dongjoon Hyun 024482bf51 [MINOR][DOCS] Fix all typos in markdown files of doc and similar patterns in other comments
## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.
2016-02-22 09:52:07 +00:00
Yuhao Yang c2c956bcd1 [ML][DOC] fix wrong api link in ml onevsrest
minor fix for api link in ml onevsrest

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #11068 from hhbyyh/onevsrestDoc.
2016-02-03 21:19:44 -08:00
Yanbo Liang 1c6cf1a563 [SPARK-12570][ML][DOC] DecisionTreeRegressor: provide variance of prediction: user guide update
Update user guide doc for ```DecisionTreeRegressor``` providing variance of prediction.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10594 from yanboliang/spark-12570.
2016-01-05 14:24:32 -08:00
Timothy Hunter 2ecbe02d5b [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation.
Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).

It also removes some files that I forgot to delete with #10207

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10234 from thunterdb/12212.
2015-12-10 12:50:46 -08:00
Timothy Hunter 765c67f5f2 [SPARK-8517][ML][DOC] Reorganizes the spark.ml user guide
This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested.

<img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png">

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10207 from thunterdb/spark-8517.
2015-12-08 18:40:21 -08:00