This PR updates the MLlib user guide and adds migration guide for 1.4->1.5.
* merge migration guide for `spark.mllib` and `spark.ml` packages
* remove dependency section from `spark.ml` guide
* move the paragraph about `spark.mllib` and `spark.ml` to the top and recommend `spark.ml`
* move Sam's talk to footnote to make the section focus on dependencies
Minor changes to code examples and other wording will be in a separate PR.
jkbradley srowen feynmanliang
Author: Xiangrui Meng <meng@databricks.com>
Closes#8498 from mengxr/SPARK-9671.
* Adds user guide for `LinearRegressionSummary`
* Fixes unresolved issues in #8197
CC jkbradley mengxr
Author: Feynman Liang <fliang@databricks.com>
Closes#8491 from feynmanliang/SPARK-9905.
I added a small note about the different types of evaluator and the metrics used.
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes#8304 from MechCoder/multiclass_evaluator.
https://issues.apache.org/jira/browse/SPARK-10287
After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet).
Author: Yin Huai <yhuai@databricks.com>
Closes#8469 from yhuai/jsonRefresh.
* Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
* Cleans up scaladocs for public methods
* Adds test for Java compatibility
* Follow up Python user guide code example is tracked by SPARK-10249
Author: Feynman Liang <fliang@databricks.com>
Closes#8436 from feynmanliang/SPARK-10230.
jira: https://issues.apache.org/jira/browse/SPARK-9901
The jira covers only the document update. I can further provide example code for QR (like the ones for SVD and PCA) in a separate PR.
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes#8462 from hhbyyh/qrDoc.
https://issues.apache.org/jira/browse/SPARK-10315
this parameter is not used any longer and there is some mistake in the current document , should be 'akka.remote.watch-failure-detector.threshold'
Author: CodingCat <zhunansjtu@gmail.com>
Closes#8483 from CodingCat/SPARK_10315.
* Adds two new sections to LDA's user guide; one for each optimizer/model
* Documents new features added to LDA (e.g. topXXXperXXX, asymmetric priors, hyperpam optimization)
* Cleans up a TODO and sets a default parameter in LDA code
jkbradley hhbyyh
Author: Feynman Liang <fliang@databricks.com>
Closes#8254 from feynmanliang/SPARK-9888.
jira: https://issues.apache.org/jira/browse/SPARK-8531
Update ML user guide for MinMaxScaler
Author: Yuhao Yang <hhbyyh@gmail.com>
Author: unknown <yuhaoyan@yuhaoyan-MOBL1.ccr.corp.intel.com>
Closes#7211 from hhbyyh/minmaxdoc.
User guide for spark.ml GBTs and Random Forests.
The examples are copied from the decision tree guide and modified to run.
I caught some issues I had somehow missed in the tree guide as well.
I have run all examples, including Java ones. (Of course, I thought I had previously as well...)
CC: mengxr manishamde yanboliang
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#8369 from jkbradley/ml-ensemble-docs.
Update `lineLengths.persist();` to `lineLengths.persist(StorageLevel.MEMORY_ONLY());` because `JavaRDD#persist` needs a parameter of `StorageLevel`.
Author: Keiji Yoshida <yoshida.keiji.84@gmail.com>
Closes#8372 from yosssi/patch-1.
Add user guide for `VectorSlicer`, with Java test suite and Python version VectorSlicer.
Note that Python version does not support selecting by names now.
Author: Xusen Yin <yinxusen@gmail.com>
Closes#8267 from yinxusen/SPARK-9893.
Added user guide for multilayer perceptron classifier:
- Simplified description of the multilayer perceptron classifier
- Example code for Scala and Java
Author: Alexander Ulanov <nashb@yandex.ru>
Closes#8262 from avulanov/SPARK-9846-mlpc-docs.
This allows skipping the code that tries to talk to Hive and HBase to
fetch delegation tokens, in case that somehow conflicts with the application
being run.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#8134 from vanzin/SPARK-9833.
1, Add Python example for mllib FP-growth user guide.
2, Correct mistakes of Scala and Java examples.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#8279 from yanboliang/spark-10084.
New user guide section ml-decision-tree.md, including code examples.
I have run all examples, including the Java ones.
CC: manishamde yanboliang mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#8244 from jkbradley/ml-dt-docs.
By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label.
I think it is better to make it explicit on documentation.
Author: lewuathe <lewuathe@me.com>
Closes#8205 from Lewuathe/SPARK-9977.
`Lists.newArrayList` -> `Arrays.asList`
CC jkbradley feynmanliang
Anybody into replacing usages of `Lists.newArrayList` in the examples / source code too? this method isn't useful in Java 7 and beyond.
Author: Sean Owen <sowen@cloudera.com>
Closes#8272 from srowen/SPARK-10070.
SPARK-9436 simplifies the Pregel code. graphx-programming-guide needs to be modified accordingly since it lists the old Pregel code
Author: Alexander Ulanov <nashb@yandex.ru>
Closes#7831 from avulanov/SPARK-9508-pregel-doc2.
Add a new test case in yarn/ClientSuite which checks how the various SparkConf
and ClientArguments propagate into the ApplicationSubmissionContext.
Author: Dennis Huo <dhuo@google.com>
Closes#8072 from dennishuo/dhuo-yarn-application-tags.
Adds user guide for `PrefixSpan`, including Scala and Java example code.
mengxr zhangjiajin
Author: Feynman Liang <fliang@databricks.com>
Closes#8253 from feynmanliang/SPARK-9898.
Add Python API, user guide and example for ml.feature.ElementwiseProduct.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#8061 from yanboliang/SPARK-9768.
Deprecate NIO ConnectionManager in Spark 1.5.0, before removing it in Spark 1.6.0.
Author: Reynold Xin <rxin@databricks.com>
Closes#8162 from rxin/SPARK-9934.
… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.
Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>
Closes#7657 from neurons/SPARK-9092.
Some users like to download additional files in their sandbox that they can refer to from their spark program, or even later mount these files to another directory.
Author: Timothy Chen <tnachen@gmail.com>
Closes#7195 from tnachen/mesos_files.
This documents the use of R model formulae in the SparkR guide. Also fixes some bugs in the R api doc.
mengxr
Author: Eric Liang <ekl@databricks.com>
Closes#8085 from ericl/docs.
This PR is based on #4229, thanks prabeesh.
Closes#4229
Author: Prabeesh K <prabsmails@gmail.com>
Author: zsxwing <zsxwing@gmail.com>
Author: prabs <prabsmails@gmail.com>
Author: Prabeesh K <prabeesh.k@namshi.com>
Closes#7833 from zsxwing/pr4229 and squashes the following commits:
9570bec [zsxwing] Fix the variable name and check null in finally
4a9c79e [zsxwing] Fix pom.xml indentation
abf5f18 [zsxwing] Merge branch 'master' into pr4229
935615c [zsxwing] Fix the flaky MQTT tests
47278c5 [zsxwing] Include the project class files
478f844 [zsxwing] Add unpack
5f8a1d4 [zsxwing] Make the maven build generate the test jar for Python MQTT tests
734db99 [zsxwing] Merge branch 'master' into pr4229
126608a [Prabeesh K] address the comments
b90b709 [Prabeesh K] Merge pull request #1 from zsxwing/pr4229
d07f454 [zsxwing] Register StreamingListerner before starting StreamingContext; Revert unncessary changes; fix the python unit test
a6747cb [Prabeesh K] wait for starting the receiver before publishing data
87fc677 [Prabeesh K] address the comments:
97244ec [zsxwing] Make sbt build the assembly test jar for streaming mqtt
80474d1 [Prabeesh K] fix
1f0cfe9 [Prabeesh K] python style fix
e1ee016 [Prabeesh K] scala style fix
a5a8f9f [Prabeesh K] added Python test
9767d82 [Prabeesh K] implemented Python-friendly class
a11968b [Prabeesh K] fixed python style
795ec27 [Prabeesh K] address comments
ee387ae [Prabeesh K] Fix assembly jar location of mqtt-assembly
3f4df12 [Prabeesh K] updated version
b34c3c1 [prabs] adress comments
3aa7fff [prabs] Added Python streaming mqtt word count example
b7d42ff [prabs] Mqtt streaming support in Python
Author: Mahmoud Lababidi <lababidi@gmail.com>
Closes#8076 from lababidi/master and squashes the following commits:
af4553b [Mahmoud Lababidi] Fixed AtmoicReference<> Example
Straightforward fix on doc typo
Author: Jeff Zhang <zjffdu@apache.org>
Closes#8019 from zjffdu/master and squashes the following commits:
aed6e64 [Jeff Zhang] Fix doc typo