jira: https://issues.apache.org/jira/browse/SPARK-8043
I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes#6584 from hhbyyh/naiveDocExample and squashes the following commits:
a01a206 [Yuhao Yang] fix for Gaussian mixture
2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc
(cherry picked from commit 43adbd5611)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
A couple of links in the MLlib Naive Bayes documentation for v1.4 were broken due to the addition of either space or newline characters between the link title and link URL in the markdown doc. (Interestingly enough, they are rendered correctly in the GitHub viewer, but not when compiled to HTML by Jekyll.)
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6412 from dusenberrymw/Fix_Broken_Links_In_MLlib_Naive_Bayes_Docs and squashes the following commits:
91a4028 [Mike Dusenberry] Fixing misformatted links by removing space and newline characters.
(cherry picked from commit e5a63a0e39)
Signed-off-by: Sean Owen <sowen@cloudera.com>
to be consistent with other string names in MLlib. This PR also updates the implementation to use vals instead of hardcoded strings. jkbradley leahmcguire
Author: Xiangrui Meng <meng@databricks.com>
Closes#6277 from mengxr/SPARK-7752 and squashes the following commits:
f38b662 [Xiangrui Meng] add another case _ back in test
ae5c66a [Xiangrui Meng] model type -> modelType
711d1c6 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7752
40ae53e [Xiangrui Meng] fix Java test suite
264a814 [Xiangrui Meng] add case _ back
3c456a8 [Xiangrui Meng] update NB user guide
17bba53 [Xiangrui Meng] update naive Bayes to use lowercase model type strings
(cherry picked from commit 13348e21b6)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Added optional model type parameter for NaiveBayes training. Can be either Multinomial or Bernoulli.
When Bernoulli is given the Bernoulli smoothing is used for fitting and for prediction as per: http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html.
Default for model is original Multinomial fit and predict.
Added additional testing for Bernoulli and Multinomial models.
Author: leahmcguire <lmcguire@salesforce.com>
Author: Joseph K. Bradley <joseph@databricks.com>
Author: Leah McGuire <lmcguire@salesforce.com>
Closes#4087 from leahmcguire/master and squashes the following commits:
f3c8994 [leahmcguire] changed checks on model type to requires
acb69af [leahmcguire] removed enum type and replaces all modelType parameters with strings
2224b15 [Leah McGuire] Merge pull request #2 from jkbradley/leahmcguire-master
9ad89ca [Joseph K. Bradley] removed old code
6a8f383 [Joseph K. Bradley] Added new model save/load format 2.0 for NaiveBayesModel after modelType parameter was added. Updated tests. Also updated ModelType enum-like type.
852a727 [leahmcguire] merged with upstream master
a22d670 [leahmcguire] changed NaiveBayesModel modelType parameter back to NaiveBayes.ModelType, made NaiveBayes.ModelType serializable, fixed getter method in NavieBayes
18f3219 [leahmcguire] removed private from naive bayes constructor for lambda only
bea62af [leahmcguire] put back in constructor for NaiveBayes
01baad7 [leahmcguire] made fixes from code review
fb0a5c7 [leahmcguire] removed typo
e2d925e [leahmcguire] fixed nonserializable error that was causing naivebayes test failures
2d0c1ba [leahmcguire] fixed typo in NaiveBayes
c298e78 [leahmcguire] fixed scala style errors
b85b0c9 [leahmcguire] Merge remote-tracking branch 'upstream/master'
900b586 [leahmcguire] fixed model call so that uses type argument
ea09b28 [leahmcguire] Merge remote-tracking branch 'upstream/master'
e016569 [leahmcguire] updated test suite with model type fix
85f298f [leahmcguire] Merge remote-tracking branch 'upstream/master'
dc65374 [leahmcguire] integrated model type fix
7622b0c [leahmcguire] added comments and fixed style as per rb
b93aaf6 [Leah McGuire] Merge pull request #1 from jkbradley/nb-model-type
3730572 [Joseph K. Bradley] modified NB model type to be more Java-friendly
b61b5e2 [leahmcguire] added back compatable constructor to NaiveBayesModel to fix MIMA test failure
5a4a534 [leahmcguire] fixed scala style error in NaiveBayes
3891bf2 [leahmcguire] synced with apache spark and resolved merge conflict
d9477ed [leahmcguire] removed old inaccurate comment from test suite for mllib naive bayes
76e5b0f [leahmcguire] removed unnecessary sort from test
0313c0c [leahmcguire] fixed style error in NaiveBayes.scala
4a3676d [leahmcguire] Updated changes re-comments. Got rid of verbose populateMatrix method. Public api now has string instead of enumeration. Docs are updated."
ce73c63 [leahmcguire] added Bernoulli option to niave bayes model in mllib, added optional model type parameter for training. When Bernoulli is given the Bernoulli smoothing is used for fitting and for prediction http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html
Author: Kamil Smuga <smugakamil@gmail.com>
Author: stderr <smugakamil@gmail.com>
Closes#5120 from kamilsmuga/master and squashes the following commits:
fee3281 [Kamil Smuga] more python api links fixed for docs
13240cb [Kamil Smuga] resolved merge conflicts with upstream/master
6649b3b [Kamil Smuga] fix broken docs links to Python API
92f03d7 [stderr] Fix links to pyspark api
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes#4834 from MechCoder/spark-6083 and squashes the following commits:
1cdd7b5 [MechCoder] Add parse function
65bbbe9 [MechCoder] [SPARK-6083] Make Python API example consistent in NaiveBayes
Should pass spark context to save/load
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#4816 from jkbradley/ml-io-doc-fix and squashes the following commits:
83d369d [Joseph K. Bradley] added comment to save,load parts of ML guide examples
2841170 [Joseph K. Bradley] Fixed save,load calls in ML guide examples
* Add GradientBoostedTrees Python examples to ML guide
* I ran these in the pyspark shell, and they worked.
* Add save/load to examples in ML guide
* Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981)
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#4750 from jkbradley/SPARK-5974 and squashes the following commits:
c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes
bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide. Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet).
6d81c3e [Joseph K. Bradley] completed python GBT examples
9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases
c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide. Added GBT examples to ML guide
the filter tests Double objects by references whereas it should test their values
Author: Dariusz Kobylarz <darek.kobylarz@gmail.com>
Closes#3081 from dkobylarz/master and squashes the following commits:
5d43a39 [Dariusz Kobylarz] naive bayes example update
a304b93 [Dariusz Kobylarz] fixed MLlib Naive-Bayes java example bug
because NB treats feature values as term frequencies. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes#2038 from mengxr/nb-neg and squashes the following commits:
52c37c3 [Xiangrui Meng] address comments
65f892d [Xiangrui Meng] detect negative values in nb
As per discussions with Xiangrui, I've reorganized and edited the mllib documentation.
Author: Ameet Talwalkar <atalwalkar@gmail.com>
Closes#1908 from atalwalkar/master and squashes the following commits:
fe6938a [Ameet Talwalkar] made xiangruis suggested changes
840028b [Ameet Talwalkar] made xiangruis suggested changes
7ec366a [Ameet Talwalkar] reorganize and edit mllib documentation
(Just made a PR for this, mengxr was the reporter of:)
MLlib has sample data under serveral folders:
1) data/mllib
2) data/
3) mllib/data/*
Per previous discussion with Matei Zaharia, we want to put them under `data/mllib` and clean outdated files.
Author: Sean Owen <sowen@cloudera.com>
Closes#1394 from srowen/SPARK-2363 and squashes the following commits:
54313dd [Sean Owen] Move ML example data from /mllib/data/ and /data/ into /data/mllib/
This is the only occurrence of this pattern in the examples that needs to be replaced. It only addresses the example change.
Author: Sean Owen <sowen@cloudera.com>
Closes#1250 from srowen/SPARK-2293 and squashes the following commits:
6b1b28c [Sean Owen] Compute prediction-and-label RDD directly rather than by zipping, for efficiency
Some improvements to MLlib guide:
1. [SPARK-1872] Update API links for unidoc.
2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display.
3. Add more Java/Python examples.
Author: Xiangrui Meng <meng@databricks.com>
Closes#816 from mengxr/mllib-doc and squashes the following commits:
ec2e407 [Xiangrui Meng] format scala example for ALS
cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types
4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example
d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles
561fdc0 [Xiangrui Meng] add a displayTitle option to global layout
195d06f [Xiangrui Meng] add Java example for summary stats and minor fix
9f1ff89 [Xiangrui Meng] update java api links in mllib-basics
7dad18e [Xiangrui Meng] update java api links in NB
3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python
35bdeb9 [Xiangrui Meng] api/mllib -> api/scala
e4afaa8 [Xiangrui Meng] explicity state what might change
While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs.
Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown.
Author: Sean Owen <sowen@cloudera.com>
Closes#653 from srowen/SPARK-1727 and squashes the following commits:
6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count
8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output)
99966a9 [Sean Owen] Update issue tracker URL in docs
23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak)
8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs