## What changes were proposed in this pull request?
Easy fix in the documentation.
## How was this patch tested?
N/A
Closes#20948
Author: Daniel Sakuma <dsakuma@gmail.com>
Closes#20928 from dsakuma/fix_typo_configuration_docs.
## What changes were proposed in this pull request?
User guide and examples are updated to reflect multiclass logistic regression summary which was added in [SPARK-17139](https://issues.apache.org/jira/browse/SPARK-17139).
I did not make a separate summary example, but added the summary code to the multiclass example that already existed. I don't see the need for a separate example for the summary.
## How was this patch tested?
Docs and examples only. Ran all examples locally using spark-submit.
Author: sethah <shendrickson@cloudera.com>
Closes#20332 from sethah/multiclass_summary_example.
## What changes were proposed in this pull request?
1, add an example for sparkr `decisionTree`
2, document it in user guide
## How was this patch tested?
local submit
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#18067 from zhengruifeng/dt_example.
Update GLM documentation to include the Tweedie distribution. #16344
jkbradley yanboliang
Author: actuaryzhang <actuaryzhang10@gmail.com>
Closes#17103 from actuaryzhang/doc.
## What changes were proposed in this pull request?
Documentation and examples (Java, scala, python, R) for LinearSVC
## How was this patch tested?
local doc generation
Author: Yuhao Yang <yuhao.yang@intel.com>
Closes#16968 from hhbyyh/mlsvmdoc.
## What changes were proposed in this pull request?
* Add all R examples for ML wrappers which were added during 2.1 release cycle.
* Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them.
* Add corresponding examples to ML user guide.
* Update ML section of SparkR user guide.
Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```.
## How was this patch tested?
Run all examples manually.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#16148 from yanboliang/spark-18325.
## What changes were proposed in this pull request?
Logistic Regression summary is added in Python API. We need to add example and document for summary.
The newly added example is consistent with Scala and Java examples.
## How was this patch tested?
Manually tests: Run the example with spark-submit; copy & paste code into pyspark; build document and check the document.
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes#16064 from wangmiao1981/py.
## What changes were proposed in this pull request?
Add R examples to ML programming guide for the following algorithms as POC:
* spark.glm
* spark.survreg
* spark.naiveBayes
* spark.kmeans
The four algorithms were added to SparkR since 2.0.0, more docs for algorithms added during 2.1 release cycle will be addressed in a separate follow-up PR.
## How was this patch tested?
This is the screenshots of generated ML programming guide for ```GeneralizedLinearRegression```:
![image](https://cloud.githubusercontent.com/assets/1962026/20866403/babad856-b9e1-11e6-9984-62747801e8c4.png)
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#16136 from yanboliang/spark-18279.
## What changes were proposed in this pull request?
1, There are two `[Graph.partitionBy]` in `graphx-programming-guide.md`, the first one had no effert.
2, `DataFrame`, `Transformer`, `Pipeline` and `Parameter` in `ml-pipeline.md` were linked to `ml-guide.html` by mistake.
3, `PythonMLLibAPI` in `mllib-linear-methods.md` was not accessable, because class `PythonMLLibAPI` is private.
4, Other link updates.
## How was this patch tested?
manual tests
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#15912 from zhengruifeng/md_fix.
## What changes were proposed in this pull request?
Add links to API docs for ML algos
## How was this patch tested?
Manual checking for the API links
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#15890 from zhengruifeng/algo_link.
## What changes were proposed in this pull request?
1, `**Example**` => `**Examples**`, because more algos use `**Examples**`.
2, delete `### Examples` in `Isotonic regression`, because it's not that special in http://spark.apache.org/docs/latest/ml-classification-regression.html
3, add missing marks for `LDA` and other algos.
## How was this patch tested?
No tests for it only modify doc
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#15783 from zhengruifeng/doc_fix.
## What changes were proposed in this pull request?
Updates user guide to reflect that LogisticRegression now supports multiclass. Also adds new examples to show multiclass training.
## How was this patch tested?
Ran locally using spark-submit, run-example, and copy/paste from user guide into shells. Generated docs and verified correct output.
Author: sethah <seth.hendrickson16@gmail.com>
Closes#15349 from sethah/SPARK-17239.
## What changes were proposed in this pull request?
Made DataFrame-based API primary
* Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html
* mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html
* ml-guide.html includes a "maintenance mode" announcement about the RDD-based API
* **Reviewers: please check this carefully**
* (minor) Titles for DF API no longer include "- spark.ml" suffix. Titles for RDD API have "- RDD-based API" suffix
* Moved migration guide to ml-guide from mllib-guide
* Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides
* **Reviewers**: I did not change any of the content of the migration guides.
Reorganized DataFrame-based guide:
* ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc.
* Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html
* **Reviewers**: I did not change the content of these guides, except some intro text.
* Sidebar remains the same, but with pipeline and tuning sections added
Other:
* ml-classification-regression.html: Moved text about linear methods to new section in page
## How was this patch tested?
Generated docs locally
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#14213 from jkbradley/ml-guide-2.0.
## What changes were proposed in this pull request?
add ml doc for ml isotonic regression
add scala example for ml isotonic regression
add java example for ml isotonic regression
add python example for ml isotonic regression
modify scala example for mllib isotonic regression
modify java example for mllib isotonic regression
modify python example for mllib isotonic regression
add data/mllib/sample_isotonic_regression_libsvm_data.txt
delete data/mllib/sample_isotonic_regression_data.txt
## How was this patch tested?
N/A
Author: WeichenXu <WeichenXu123@outlook.com>
Closes#13381 from WeichenXu123/add_isotonic_regression_doc.
## What changes were proposed in this pull request?
When fitting ```LinearRegressionModel```(by "l-bfgs" solver) and ```LogisticRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce same model as R glmnet but different from LIBSVM.
When fitting ```AFTSurvivalRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce different model compared with R survival::survreg.
We should output a warning message and clarify in document for this condition.
## How was this patch tested?
Document change, no unit test.
cc mengxr
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#12731 from yanboliang/spark-13590.
## What changes were proposed in this pull request?
This patch adds a user guide section for generalized linear regression and includes the examples from [#12754](https://github.com/apache/spark/pull/12754).
## How was this patch tested?
Documentation only, no tests required.
## Approach
In general, it is a bit unclear what level of detail ought to be included in the user guide since there is a lot of variability within the current user guide. I tried to give a fairly brief mathematical introduction to GLMs, and cover what types of problems they could be used for. Additionally, I included a brief blurb on the IRLS solver. The input/output columns are given in a table as is found elsewhere in the docs (though, again, these appear rather intermittently in the current docs), as well as a table providing the supported families and their link functions.
Author: sethah <seth.hendrickson16@gmail.com>
Closes#13139 from sethah/SPARK-15186.
## What changes were proposed in this pull request?
Correct some typos and incorrectly worded sentences.
## How was this patch tested?
Doc changes only.
Note that many of these changes were identified by whomfire01
Author: sethah <seth.hendrickson16@gmail.com>
Closes#13180 from sethah/ml_guide_audit.
## What changes were proposed in this pull request?
1, Add python example for OneVsRest
2, remove args-parsing
## How was this patch tested?
manual tests
`./bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py`
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#12920 from zhengruifeng/ovr_pe.
jira: https://issues.apache.org/jira/browse/SPARK-13089
Add section in ml-classification.md for NaiveBayes DataFrame-based API, plus example code (using include_example to clip code from examples/ folder files).
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes#11015 from hhbyyh/naiveBayesDoc.
## What changes were proposed in this pull request?
This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.
## How was the this patch tested?
manual tests.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#11300 from dongjoon-hyun/minor_fix_typos.
Update user guide doc for ```DecisionTreeRegressor``` providing variance of prediction.
cc jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#10594 from yanboliang/spark-12570.
Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).
It also removes some files that I forgot to delete with #10207
Author: Timothy Hunter <timhunter@databricks.com>
Closes#10234 from thunterdb/12212.