Now PySpark on YARN with cluster mode is supported so let's update doc.
Author: Kousuke Saruta <sarutakoss.nttdata.co.jp>
Closes#6040 from sarutak/update-doc-for-pyspark-on-yarn and squashes the following commits:
ad9f88c [Kousuke Saruta] Brushed up sentences
469fd2e [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into update-doc-for-pyspark-on-yarn
fcfdb92 [Kousuke Saruta] Updated doc for PySpark on YARN with cluster mode
Author: Punya Biswal <pbiswal@palantir.com>
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#6842 from punya/feature/SPARK-7515 and squashes the following commits:
0b83648 [Punya Biswal] Merge remote-tracking branch 'origin/branch-1.4' into feature/SPARK-7515
de025cd [Kousuke Saruta] [SPARK-7515] [DOC] Update documentation for PySpark on YARN with cluster mode
start-slave.sh no longer takes a worker # param in 1.4+
Author: Sean Owen <sowen@cloudera.com>
Closes#6855 from srowen/SPARK-8395 and squashes the following commits:
300278e [Sean Owen] start-slave.sh no longer takes a worker # param in 1.4+
(cherry picked from commit f005be0273)
Signed-off-by: Andrew Or <andrew@databricks.com>
[SQL][DOC] I found it a bit confusing when I came across it for the first time in the docs
Author: Radek Ostrowski <dest.hawaii@gmail.com>
Author: radek <radek@radeks-MacBook-Pro-2.local>
Closes#6332 from radek1st/master and squashes the following commits:
dae3347 [Radek Ostrowski] fixed typo
c76bb3a [radek] improved a comment
(cherry picked from commit 4bd10fd509)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Typo in thriftserver section
Author: Moussa Taifi <moutai10@gmail.com>
Closes#6847 from moutai/patch-1 and squashes the following commits:
1bd29df [Moussa Taifi] Update sql-programming-guide.md
(cherry picked from commit dc455b8833)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Author: Peter Hoffmann <ph@peter-hoffmann.com>
Closes#6815 from hoffmann/patch-1 and squashes the following commits:
2abb6da [Peter Hoffmann] fix read/write mixup
(cherry picked from commit f3f2a4397d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This improves the Spark Streaming Guides by fixing broken links, rewording confusing sections, fixing typos, adding missing words, etc.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6801 from dusenberrymw/SPARK-8343_Improve_Spark_Streaming_Guides_MERGED and squashes the following commits:
6688090 [Mike Dusenberry] Improvements to the Spark Streaming Custom Receiver Guide, including slight rewording of confusing sections, and fixing typos & missing words.
436fbd8 [Mike Dusenberry] Bunch of improvements to the Spark Streaming Guide, including fixing broken links, slight rewording of confusing sections, fixing typos & missing words, etc.
(cherry picked from commit 35d1267cf8)
Signed-off-by: Reynold Xin <rxin@databricks.com>
- Kinesis API updated
- Kafka version updated, and Python API for Direct Kafka added
- Added SQLContext.getOrCreate()
- Added information on how to get partitionId in foreachRDD
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#6781 from tdas/SPARK-7284 and squashes the following commits:
aac7be0 [Tathagata Das] Added information on how to get partition id
a66ec22 [Tathagata Das] Complete the line incomplete line,
a92ca39 [Tathagata Das] Updated streaming documentation
(cherry picked from commit e9471d3414)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#6766 from vanzin/SPARK-6511 and squashes the following commits:
49f0f67 [Marcelo Vanzin] [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
(cherry picked from commit 9cbdf31ec1)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Read number of threads for RBackend from configuration.
[SPARK-8282] #comment Linking with JIRA
Author: Hossein <hossein@databricks.com>
Closes#6730 from falaki/SPARK-8282 and squashes the following commits:
33b3d98 [Hossein] Documented new config parameter
70f2a9c [Hossein] Fixing import
ec44225 [Hossein] Read number of threads for RBackend from configuration
(cherry picked from commit 30ebf1a233)
Signed-off-by: Andrew Or <andrew@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Closes#6749 from liancheng/java-sample-fix and squashes the following commits:
5b44585 [Cheng Lian] Fixes a minor Java example error in SQL programming guide
(cherry picked from commit 8f7308f9c4)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This provides preliminary documentation pointing out how to use the
Hadoop free builds. I am hoping over time this list can grow to
include most of the popular Hadoop distributions.
Getting more people using these builds will help us long term reduce
the number of binaries we build.
Author: Patrick Wendell <patrick@databricks.com>
Closes#6729 from pwendell/hadoop-provided and squashes the following commits:
1113b76 [Patrick Wendell] [SPARK-6511] [Documentation] Explain how to use Hadoop provided builds
(cherry picked from commit 6e4fb0c9e8)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
this is a follow up of #3621
/cc liancheng pwendell
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#6639 from adrian-wang/kryodoc and squashes the following commits:
3c4b1cf [Daoyuan Wang] [DOC] kryo default setting in SQL Thrift server
(cherry picked from commit 10fc2f6f51)
Signed-off-by: Reynold Xin <rxin@databricks.com>
…moved if dynamic allocation is enabled.
This is a work in progress. This patch ensures that an executor that has cached RDD blocks are not removed,
but makes no attempt to find another executor to remove. This is meant to get some feedback on the current
approach, and if it makes sense then I will look at choosing another executor to remove. No testing has been done either.
Author: Hari Shreedharan <hshreedharan@apache.org>
Closes#6508 from harishreedharan/dymanic-caching and squashes the following commits:
dddf1eb [Hari Shreedharan] Minor configuration description update.
10130e2 [Hari Shreedharan] Fix compile issue.
5417b53 [Hari Shreedharan] Add documentation for new config. Remove block from cachedBlocks when it is dropped.
875916a [Hari Shreedharan] Make some code more readable.
39940ca [Hari Shreedharan] Handle the case where the executor has not yet registered.
90ad711 [Hari Shreedharan] Remove unused imports and unused methods.
063985c [Hari Shreedharan] Send correct message instead of recursively calling same method.
ec2fd7e [Hari Shreedharan] Add file missed in last commit
5d10fad [Hari Shreedharan] Update cached blocks status using local info, rather than doing an RPC.
193af4c [Hari Shreedharan] WIP. Use local state rather than via RPC.
ae932ff [Hari Shreedharan] Fix config param name.
272969d [Hari Shreedharan] Fix seconds to millis bug.
5a1993f [Hari Shreedharan] Add timeout for cache executors. Ignore broadcast blocks while checking if there are cached blocks.
57fefc2 [Hari Shreedharan] [SPARK-7955][Core] Ensure executors with cached RDD blocks are not removed if dynamic allocation is enabled.
(cherry picked from commit 3285a51121)
Signed-off-by: Andrew Or <andrew@databricks.com>
Add documentation for spark.sql.planner.externalSort
Author: Luca Martinetti <luca@luca.io>
Closes#6272 from lucamartinetti/docs-externalsort and squashes the following commits:
985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort
(cherry picked from commit 4060526cd3)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Fixed the broken links (Examples) in the documentation.
Author: Akhil Das <akhld@darktech.ca>
Closes#6666 from akhld/patch-2 and squashes the following commits:
2228b83 [Akhil Das] Update streaming-kafka-integration.md
(cherry picked from commit 019dc9f558)
Signed-off-by: Sean Owen <sowen@cloudera.com>
jira: https://issues.apache.org/jira/browse/SPARK-8043
I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes#6584 from hhbyyh/naiveDocExample and squashes the following commits:
a01a206 [Yuhao Yang] fix for Gaussian mixture
2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc
(cherry picked from commit 43adbd5611)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Updating ML Doc's *"Estimator, Transformer, and Param"* example to use `model.extractParamMap` instead of `model.fittingParamMap`, which no longer exists.
mengxr, I believe this addresses (part of) the *update documentation* TODO list item from [PR 5820](https://github.com/apache/spark/pull/5820).
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6514 from dusenberrymw/Fix_ML_Doc_Estimator_Transformer_Param_Example and squashes the following commits:
6366e1f [Mike Dusenberry] Updating instances of model.extractParamMap to model.parent.extractParamMap, since the Params of the parent Estimator could possibly differ from thos of the Model.
d850e0e [Mike Dusenberry] Removing all references to "fittingParamMap" throughout Spark, since it has been removed.
0480304 [Mike Dusenberry] Updating the ML Doc "Estimator, Transformer, and Param" Java example to use model.extractParamMap() instead of model.fittingParamMap(), which no longer exists.
7d34939 [Mike Dusenberry] Updating ML Doc "Estimator, Transformer, and Param" example to use model.extractParamMap instead of model.fittingParamMap, which no longer exists.
(cherry picked from commit ad06727fe9)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes#6561 from mengxr/SPARK-7582 and squashes the following commits:
4bba4f1 [Xiangrui Meng] fix example
ba1cd1b [Xiangrui Meng] fix style
7fa18d1 [Xiangrui Meng] add user guide for StringIndexer
136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer
(cherry picked from commit 0221c7f0ef)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
This PR adds a section in the user guide for `VectorAssembler` with code examples in Python/Java/Scala. It also adds a unit test in Java.
jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes#6556 from mengxr/SPARK-7584 and squashes the following commits:
11313f6 [Xiangrui Meng] simplify Java example
0cd47f3 [Xiangrui Meng] update user guide
fd36292 [Xiangrui Meng] update Java unit test
ce61ca0 [Xiangrui Meng] add Java unit test for VectorAssembler
e399942 [Xiangrui Meng] scala/python example code
(cherry picked from commit 90c606925e)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
pwendell tdas
Author: Nishkam Ravi <nravi@cloudera.com>
Author: nishkamravi2 <nishkamravi@gmail.com>
Author: nravi <nravi@c1704.halxg.cloudera.com>
Closes#6544 from nishkamravi2/master_nravi and squashes the following commits:
46e8c03 [Nishkam Ravi] Slight modification to streaming docs
(cherry picked from commit e7c7e51f2e)
Signed-off-by: Sean Owen <sowen@cloudera.com>
add save load for examples:
KMeansModel
PowerIterationClusteringModel
Word2VecModel
IsotonicRegressionModel
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes#6498 from hhbyyh/docSaveLoad and squashes the following commits:
7f9f06d [Yuhao Yang] add missing imports
c604cad [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docSaveLoad
1dd77cc [Yuhao Yang] update document with some missing save/load
(cherry picked from commit 0674700303)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6522 from rxin/sql-doc-1.4 and squashes the following commits:
c227be7 [Reynold Xin] Updated link.
040b6d7 [Reynold Xin] Update documentation for the new DataFrame reader/writer interface.
(cherry picked from commit 00a7137900)
Signed-off-by: Reynold Xin <rxin@databricks.com>
The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector.
This PR makes ChiSqSelector extend Serializable in MLlib, and adds the ChiSqSelector import statement to the associated example in the documentation.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6462 from dusenberrymw/Make_ChiSqSelector_Serializable_and_Fix_Related_Docs_Example and squashes the following commits:
9cb2f94 [Mike Dusenberry] Make MLlib ChiSqSelector Serializable.
d9003bf [Mike Dusenberry] Add missing import in MLlib ChiSqSelector Docs Scala example.
(cherry picked from commit 1281a35188)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Closes#6520 from liancheng/spark-7849 and squashes the following commits:
705264b [Cheng Lian] Updates SQL programming guide for 1.4
(cherry picked from commit 6e3f0c7810)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Updated the doc for the hadoop-2.6 profile, which is new to Spark 1.4
Author: Taka Shinagawa <taka.epsilon@gmail.com>
Closes#6450 from mrt/docfix2 and squashes the following commits:
db1c43b [Taka Shinagawa] Updated the hadoop versions for hadoop-2.6 profile
323710e [Taka Shinagawa] The hadoop-2.6 profile is added to the Hadoop versions table
(cherry picked from commit 3ab71eb9d5)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Remove caveat about Kafka / JDBC not being supported for Scala 2.11
Author: Sean Owen <sowen@cloudera.com>
Closes#6470 from srowen/SPARK-7890 and squashes the following commits:
4652634 [Sean Owen] One more rewording
7b7f3c8 [Sean Owen] Restore note about JDBC component
126744d [Sean Owen] Remove caveat about Kafka / JDBC not being supported for Scala 2.11
(cherry picked from commit 8c8de3ed86)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Author: Octavian Geagla <ogeagla@gmail.com>
Closes#6008 from ogeagla/elementwise-prod-doc and squashes the following commits:
72e6dc0 [Octavian Geagla] [SPARK-7459] [MLLIB] Java example import.
cf2afbd [Octavian Geagla] [SPARK-7459] [MLLIB] Update description of example.
b66431b [Octavian Geagla] [SPARK-7459] [MLLIB] Add override annotation to java example, make scala example use same data as java.
6b26b03 [Octavian Geagla] [SPARK-7459] [MLLIB] Fix line which is too long.
79af020 [Octavian Geagla] [SPARK-7459] [MLLIB] Actually don't use Java 8.
9d5b31a [Octavian Geagla] [SPARK-7459] [MLLIB] Don't use Java 8
4f0c92f [Octavian Geagla] [SPARK-7459] [MLLIB] ElementwiseProduct Java example.
(cherry picked from commit e3a4374833)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Author: Octavian Geagla <ogeagla@gmail.com>
Closes#6501 from ogeagla/ml-guide-elemwiseprod and squashes the following commits:
4ad93d5 [Octavian Geagla] [SPARK-7576] [MLLIB] Incorporate code review feedback.
f7be7ad [Octavian Geagla] [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct.
(cherry picked from commit da2112aef2)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
The first line had only two dashes (--) instead of three(---). Because of this missing dash(-), 'jekyll build' command was not converting configuration.md to _site/configuration.html
Author: Taka Shinagawa <taka.epsilon@gmail.com>
Closes#6513 from mrt/docfix3 and squashes the following commits:
c470e2c [Taka Shinagawa] Added a missing dash(-) preventing jekyll from converting configuration.md to html format
(cherry picked from commit 3792d25836)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This PR adds a new SparkR programming guide at the top-level. This will be useful for R users as our APIs don't directly match the Scala/Python APIs and as we need to explain SparkR without using RDDs as examples etc.
cc rxin davies pwendell
cc cafreeman -- Would be great if you could also take a look at this !
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6490 from shivaram/sparkr-guide and squashes the following commits:
d5ff360 [Shivaram Venkataraman] Add a section on HiveContext, HQL queries
408dce5 [Shivaram Venkataraman] Fix link
dbb86e3 [Shivaram Venkataraman] Fix minor typo
9aff5e0 [Shivaram Venkataraman] Address comments, use dplyr-like syntax in example
d09703c [Shivaram Venkataraman] Fix default argument in read.df
ea816a1 [Shivaram Venkataraman] Add a new SparkR programming guide Also update write.df, read.df to handle defaults better
(cherry picked from commit 5f48e5c33b)
Signed-off-by: Davies Liu <davies@databricks.com>
CC jkbradley
Author: Xusen Yin <yinxusen@gmail.com>
Closes#6451 from yinxusen/SPARK-7577 and squashes the following commits:
e2dc32e [Xusen Yin] rename colums
e350e49 [Xusen Yin] add all demos
006ddf1 [Xusen Yin] add java test
3238481 [Xusen Yin] add bucketizer
(cherry picked from commit 1bd63e82fd)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
The location of the IDE setup information has changed, so this just updates the link on the Building Spark page.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6467 from dusenberrymw/Fix_Broken_Link_On_Building_Spark_Doc and squashes the following commits:
75c533a [Mike Dusenberry] Fixing broken "IDE setup" link in the Building Spark documentation by pointing to new location.
(cherry picked from commit 3e312a5ed0)
Signed-off-by: Sean Owen <sowen@cloudera.com>
This contribution is my original work and I license the work to the project under the project's open source license
Author: Matt Wise <mwise@quixey.com>
Closes#6447 from wisematthew/fix-typo-in-java-udf-registration-doc and squashes the following commits:
e7ef5f7 [Matt Wise] Fix typo in documentation for Java UDF registration
(cherry picked from commit 35410614de)
Signed-off-by: Reynold Xin <rxin@databricks.com>
I grep'ed hive-0.12.0 in the source code and removed all the profiles and doc references.
Author: Cheolsoo Park <cheolsoop@netflix.com>
Closes#6393 from piaozhexiu/SPARK-7850 and squashes the following commits:
fb429ce [Cheolsoo Park] Remove hive-0.13.1 profile
82bf09a [Cheolsoo Park] Remove hive 0.12.0 shim code
f3722da [Cheolsoo Park] Remove hive-0.12.0 profile and references from POM and build docs
(cherry picked from commit 6dd645870d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6422 from dusenberrymw/Fix_MLlib_Collab_Filtering_trainImplicit_Example and squashes the following commits:
36492f4 [Mike Dusenberry] Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
(cherry picked from commit 0463428b6e)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
A couple of links in the MLlib Naive Bayes documentation for v1.4 were broken due to the addition of either space or newline characters between the link title and link URL in the markdown doc. (Interestingly enough, they are rendered correctly in the GitHub viewer, but not when compiled to HTML by Jekyll.)
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6412 from dusenberrymw/Fix_Broken_Links_In_MLlib_Naive_Bayes_Docs and squashes the following commits:
91a4028 [Mike Dusenberry] Fixing misformatted links by removing space and newline characters.
(cherry picked from commit e5a63a0e39)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Adds a section in the RDD persistence section of the programming-guide docs detailing Spark-Tachyon version compatibility as discussed in [[SPARK-6391]](https://issues.apache.org/jira/browse/SPARK-6391).
Author: Calvin Jia <jia.calvin@gmail.com>
Closes#6382 from calvinjia/spark-6391 and squashes the following commits:
113e863 [Calvin Jia] Move compatibility info to the offheap storage level section.
7942dc5 [Calvin Jia] Add a section in the programming-guide docs for Tachyon compatibility.
(cherry picked from commit ce0051d6f7)
Signed-off-by: Reynold Xin <rxin@databricks.com>
sqlCtx -> sqlContext
You can check the docs by:
```
$ cd docs
$ SKIP_SCALADOC=1 jekyll serve
```
cc shivaram
Author: Davies Liu <davies@databricks.com>
Closes#5442 from davies/r_docs and squashes the following commits:
7a12ec6 [Davies Liu] remove rdd in R docs
8496b26 [Davies Liu] remove the docs related to RDD
e23b9d6 [Davies Liu] delete R docs for RDD API
222e4ff [Davies Liu] Merge branch 'master' into r_docs
89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs
f0a10e1 [Davies Liu] address comments from @shivaram
f61de71 [Davies Liu] Update pairRDD.R
3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b
2f10a77 [Davies Liu] address comments from @cafreeman
9c2a062 [Davies Liu] mention R api together with Python API
23f751a [Davies Liu] Fill in SparkR examples in programming guide
(cherry picked from commit 7af3818c6b)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Added logistic regression to the list of Multiclass Classification Supported Methods in the MLlib Classification and Regression documentation, as it was missing.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6357 from dusenberrymw/Add_LR_To_List_Of_Multiclass_Classification_Methods and squashes the following commits:
7918650 [Mike Dusenberry] Updating broken link due to the "Binary Classification" section on the Linear Methods page being renamed to "Classification".
3005dc2 [Mike Dusenberry] Adding logistic regression to the list of Multiclass Classification Supported Methods in the MLlib Classification and Regression documentation, as it was missing.
(cherry picked from commit 63a5ce75ea)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
The default add time of 5s is still too slow for small jobs. Also, the current default remove time of 10 minutes seem rather high. This patch lowers both and rephrases a few log messages.
Author: Andrew Or <andrew@databricks.com>
Closes#6301 from andrewor14/da-minor and squashes the following commits:
6d614a6 [Andrew Or] Lower log level
2811492 [Andrew Or] Log information when requests are canceled
5fcd3eb [Andrew Or] Fix tests
3320710 [Andrew Or] Lower timeouts + rephrase a few log messages
(cherry picked from commit 3d8760d76e)
Signed-off-by: Andrew Or <andrew@databricks.com>
Including Iris Dataset (after shuffling and relabeling 3 -> 0 to confirm to 0 -> numClasses-1 labeling). Could not find an existing dataset in data/mllib for multiclass classification.
Author: Ram Sriharsha <rsriharsha@hw11853.local>
Closes#6296 from harsha2010/SPARK-7574 and squashes the following commits:
645427c [Ram Sriharsha] cleanup
46c41b1 [Ram Sriharsha] cleanup
2f76295 [Ram Sriharsha] Code Review Fixes
ebdf103 [Ram Sriharsha] Java Example
c026613 [Ram Sriharsha] Code Review fixes
4b7d1a6 [Ram Sriharsha] minor cleanup
13bed9c [Ram Sriharsha] add wikipedia link
bb9dbfa [Ram Sriharsha] Clean up naming
6f90db1 [Ram Sriharsha] [SPARK-7574][ml][doc] User guide for OneVsRest
(cherry picked from commit 509d55ab41)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#6127 from jkbradley/feature-guide-2 and squashes the following commits:
cd47f4b [Joseph K. Bradley] Updated based on code review
f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3
0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests
a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF
(cherry picked from commit 2728c3df66)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Just a small change: fixed a broken link in the MLlib Linear Methods documentation by removing a newline character between the link title and link address.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6340 from dusenberrymw/Fix_MLlib_Linear_Methods_link and squashes the following commits:
0a57818 [Mike Dusenberry] Fixing broken link in MLlib Linear Methods documentation.
(cherry picked from commit e4136ea6c4)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#6255 from jkbradley/vector-indexer-guide and squashes the following commits:
dbb8c4c [Joseph K. Bradley] simplified VectorIndexerModel.javaCategoryMaps
f692084 [Joseph K. Bradley] Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it.
(cherry picked from commit 6d75ed7e5c)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
to be consistent with other string names in MLlib. This PR also updates the implementation to use vals instead of hardcoded strings. jkbradley leahmcguire
Author: Xiangrui Meng <meng@databricks.com>
Closes#6277 from mengxr/SPARK-7752 and squashes the following commits:
f38b662 [Xiangrui Meng] add another case _ back in test
ae5c66a [Xiangrui Meng] model type -> modelType
711d1c6 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7752
40ae53e [Xiangrui Meng] fix Java test suite
264a814 [Xiangrui Meng] add case _ back
3c456a8 [Xiangrui Meng] update NB user guide
17bba53 [Xiangrui Meng] update naive Bayes to use lowercase model type strings
(cherry picked from commit 13348e21b6)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…rther extension to non-json outputs too.
Author: Hari Shreedharan <hshreedharan@apache.org>
Closes#6273 from harishreedharan/json-to-api and squashes the following commits:
e14b73b [Hari Shreedharan] Rename `getJsonServlet` to `getServletHandler` i
42f8acb [Hari Shreedharan] Import order fixes.
2ef852f [Hari Shreedharan] [SPARK-7750][WebUI] Rename endpoints from `json` to `api` to allow further extension to non-json outputs too.
(cherry picked from commit a70bf06b79)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
Author: Sandy Ryza <sandy@cloudera.com>
Closes#6126 from sryza/sandy-spark-7579 and squashes the following commits:
5af803d [Sandy Ryza] SPARK-7579 [MLLIB] User guide update for OneHotEncoder
(cherry picked from commit 829f1d95ba)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the "basic" distributed matrix. This will improve comprehensibility of the "Distributed matrix" section, especially for the new reader.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6270 from dusenberrymw/Reorder_MLlib_Data_Types_Distributed_matrix_docs and squashes the following commits:
6313bab [Mike Dusenberry] The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the "basic" distributed matrix. This will improve comprehensibility of the "Distributed matrix" section, especially for the new reader.
(cherry picked from commit 3860520633)
Signed-off-by: Xiangrui Meng <meng@databricks.com>