Adds
rank, recommendUsers and RecommendProducts to MatrixFactorizationModel in PySpark.
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes#5807 from MechCoder/spark-6257 and squashes the following commits:
09629c6 [MechCoder] doc
953b326 [MechCoder] [SPARK-6257] MLlib API missing items in Recommendation
A simple wrapper to save/load `MatrixFactorizationModel` in Python. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes#4811 from mengxr/SPARK-5991 and squashes the following commits:
f135dac [Xiangrui Meng] update save doc
57e5200 [Xiangrui Meng] address comments
06140a4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5991
282ec8d [Xiangrui Meng] support save/load in PySpark's ALS
Should pass spark context to save/load
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#4816 from jkbradley/ml-io-doc-fix and squashes the following commits:
83d369d [Joseph K. Bradley] added comment to save,load parts of ML guide examples
2841170 [Joseph K. Bradley] Fixed save,load calls in ML guide examples
* Add GradientBoostedTrees Python examples to ML guide
* I ran these in the pyspark shell, and they worked.
* Add save/load to examples in ML guide
* Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981)
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#4750 from jkbradley/SPARK-5974 and squashes the following commits:
c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes
bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide. Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet).
6d81c3e [Joseph K. Bradley] completed python GBT examples
9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases
c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide. Added GBT examples to ML guide
fix python example of ALS in guide, use Rating instead of np.array.
Author: Davies Liu <davies@databricks.com>
Closes#4226 from davies/fix_als_guide and squashes the following commits:
1433d76 [Davies Liu] fix python example of als in guide
HT to Diana, just proposing an implementation of her suggestion, which I rather agreed with. Is there a second/third for the motion?
Refer to "self-contained" rather than "standalone" apps to avoid confusion with standalone deployment mode. And fix placement of reference to this in MLlib docs.
Author: Sean Owen <sowen@cloudera.com>
Closes#2787 from srowen/SPARK-1307 and squashes the following commits:
b5b82e2 [Sean Owen] Refer to "self-contained" rather than "standalone" apps to avoid confusion with standalone deployment mode. And fix placement of reference to this in MLlib docs.
atalwalkar srowen
Author: Xiangrui Meng <meng@databricks.com>
Closes#2064 from mengxr/als-doc and squashes the following commits:
b2e20ab [Xiangrui Meng] introduced -> discussed
98abdd7 [Xiangrui Meng] add reference
339bd08 [Xiangrui Meng] add a section about regularization parameter in ALS
As per discussions with Xiangrui, I've reorganized and edited the mllib documentation.
Author: Ameet Talwalkar <atalwalkar@gmail.com>
Closes#1908 from atalwalkar/master and squashes the following commits:
fe6938a [Ameet Talwalkar] made xiangruis suggested changes
840028b [Ameet Talwalkar] made xiangruis suggested changes
7ec366a [Ameet Talwalkar] reorganize and edit mllib documentation
Standalone application examples are added to 'mllib-linear-methods.md' file written in Java.
This commit is related to the issue [Add full Java Examples in MLlib docs](https://issues.apache.org/jira/browse/SPARK-1945).
Also I changed the name of the sigmoid function from 'logit' to 'f'. This is because the logit function
is the inverse of sigmoid.
Thanks,
Michael
Author: Michael Giannakopoulos <miccagiann@gmail.com>
Closes#1311 from miccagiann/master and squashes the following commits:
8ffe5ab [Michael Giannakopoulos] Update code so as to comply with code standards.
f7ad5cc [Michael Giannakopoulos] Merge remote-tracking branch 'upstream/master'
38d92c7 [Michael Giannakopoulos] Adding PCA, SVD and LBFGS examples in Java. Performing minor updates in the already committed examples so as to eradicate the call of 'productElement' function whenever is possible.
cc0a089 [Michael Giannakopoulos] Modyfied Java examples so as to comply with coding standards.
b1141b2 [Michael Giannakopoulos] Added Java examples for Clustering and Collaborative Filtering [mllib-clustering.md & mllib-collaborative-filtering.md].
837f7a8 [Michael Giannakopoulos] Merge remote-tracking branch 'upstream/master'
15f0eb4 [Michael Giannakopoulos] Java examples included in 'mllib-linear-methods.md' file.
(Just made a PR for this, mengxr was the reporter of:)
MLlib has sample data under serveral folders:
1) data/mllib
2) data/
3) mllib/data/*
Per previous discussion with Matei Zaharia, we want to put them under `data/mllib` and clean outdated files.
Author: Sean Owen <sowen@cloudera.com>
Closes#1394 from srowen/SPARK-2363 and squashes the following commits:
54313dd [Sean Owen] Move ML example data from /mllib/data/ and /data/ into /data/mllib/
Some improvements to MLlib guide:
1. [SPARK-1872] Update API links for unidoc.
2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display.
3. Add more Java/Python examples.
Author: Xiangrui Meng <meng@databricks.com>
Closes#816 from mengxr/mllib-doc and squashes the following commits:
ec2e407 [Xiangrui Meng] format scala example for ALS
cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types
4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example
d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles
561fdc0 [Xiangrui Meng] add a displayTitle option to global layout
195d06f [Xiangrui Meng] add Java example for summary stats and minor fix
9f1ff89 [Xiangrui Meng] update java api links in mllib-basics
7dad18e [Xiangrui Meng] update java api links in NB
3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python
35bdeb9 [Xiangrui Meng] api/mllib -> api/scala
e4afaa8 [Xiangrui Meng] explicity state what might change
While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs.
Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown.
Author: Sean Owen <sowen@cloudera.com>
Closes#653 from srowen/SPARK-1727 and squashes the following commits:
6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count
8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output)
99966a9 [Sean Owen] Update issue tracker URL in docs
23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak)
8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs
I used the sbt-unidoc plugin (https://github.com/sbt/sbt-unidoc) to create a unified Scaladoc of our public packages, and generate Javadocs as well. One limitation is that I haven't found an easy way to exclude packages in the Javadoc; there is a SBT task that identifies Java sources to run javadoc on, but it's been very difficult to modify it from outside to change what is set in the unidoc package. Some SBT-savvy people should help with this. The Javadoc site also lacks package-level descriptions and things like that, so we may want to look into that. We may decide not to post these right now if it's too limited compared to the Scala one.
Example of the built doc site: http://people.csail.mit.edu/matei/spark-unified-docs/
Author: Matei Zaharia <matei@databricks.com>
This patch had conflicts when merged, resolved by
Committer: Patrick Wendell <pwendell@gmail.com>
Closes#457 from mateiz/better-docs and squashes the following commits:
a63d4a3 [Matei Zaharia] Skip Java/Scala API docs for Python package
5ea1f43 [Matei Zaharia] Fix links to Java classes in Java guide, fix some JS for scrolling to anchors on page load
f05abc0 [Matei Zaharia] Don't include java.lang package names
995e992 [Matei Zaharia] Skip internal packages and class names with $ in JavaDoc
a14a93c [Matei Zaharia] typo
76ce64d [Matei Zaharia] Add groups to Javadoc index page, and a first package-info.java
ed6f994 [Matei Zaharia] Generate JavaDoc as well, add titles, update doc site to use unified docs
acb993d [Matei Zaharia] Add Unidoc plugin for the projects we want Unidoced
tex formulas in the documentation
using mathjax.
and spliting the MLlib documentation by techniques
see jira
https://spark-project.atlassian.net/browse/MLLIB-19
and
https://github.com/shivaram/spark/compare/mathjax
Author: Martin Jaggi <m.jaggi@gmail.com>
== Merge branch commits ==
commit 0364bfabbfc347f917216057a20c39b631842481
Author: Martin Jaggi <m.jaggi@gmail.com>
Date: Fri Feb 7 03:19:38 2014 +0100
minor polishing, as suggested by @pwendell
commit dcd2142c164b2f602bf472bb152ad55bae82d31a
Author: Martin Jaggi <m.jaggi@gmail.com>
Date: Thu Feb 6 18:04:26 2014 +0100
enabling inline latex formulas with $.$
same mathjax configuration as used in math.stackexchange.com
sample usage in the linear algebra (SVD) documentation
commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa
Author: Martin Jaggi <m.jaggi@gmail.com>
Date: Thu Feb 6 17:31:29 2014 +0100
split MLlib documentation by techniques
and linked from the main mllib-guide.md site
commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb
Author: Martin Jaggi <m.jaggi@gmail.com>
Date: Thu Feb 6 16:59:43 2014 +0100
enable mathjax formula in the .md documentation files
code by @shivaram
commit d73948db0d9bc36296054e79fec5b1a657b4eab4
Author: Martin Jaggi <m.jaggi@gmail.com>
Date: Thu Feb 6 16:57:23 2014 +0100
minor update on how to compile the documentation