Commit graph

10 commits

Author SHA1 Message Date
Paweł Kozikowski b9ef7ac98c [MLLIB] [DOC] Seed fix in mllib naive bayes example
Previous seed resulted in empty test data set.

Author: Paweł Kozikowski <mupakoz@gmail.com>

Closes #7477 from mupakoz/patch-1 and squashes the following commits:

f5d41ee [Paweł Kozikowski] Mllib Naive Bayes example data set enlarged
2015-07-18 10:12:48 -07:00
Yanbo Liang 0a468a46bf [SPARK-8758] [MLLIB] Add Python user guide for PowerIterationClustering
Add Python user guide for PowerIterationClustering

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #7155 from yanboliang/spark-8758 and squashes the following commits:

18d803b [Yanbo Liang] address comments
dd29577 [Yanbo Liang] Add Python user guide for PowerIterationClustering
2015-07-02 09:59:54 -07:00
Ram Sriharsha 509d55ab41 [SPARK-7574] [ML] [DOC] User guide for OneVsRest
Including Iris Dataset (after shuffling and relabeling 3 -> 0 to confirm to 0 -> numClasses-1 labeling). Could not find an existing dataset in data/mllib for multiclass classification.

Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6296 from harsha2010/SPARK-7574 and squashes the following commits:

645427c [Ram Sriharsha] cleanup
46c41b1 [Ram Sriharsha] cleanup
2f76295 [Ram Sriharsha] Code Review Fixes
ebdf103 [Ram Sriharsha] Java Example
c026613 [Ram Sriharsha] Code Review fixes
4b7d1a6 [Ram Sriharsha] minor cleanup
13bed9c [Ram Sriharsha] add wikipedia link
bb9dbfa [Ram Sriharsha] Clean up naming
6f90db1 [Ram Sriharsha] [SPARK-7574][ml][doc] User guide for OneVsRest
2015-05-22 13:18:08 -07:00
Jacky Li 651a1c019e [SPARK-5939][MLLib] make FPGrowth example app take parameters
Add parameter parsing in FPGrowth example app in Scala and Java
And a sample data file is added in data/mllib folder

Author: Jacky Li <jacky.likun@huawei.com>

Closes #4714 from jackylk/parameter and squashes the following commits:

8c478b3 [Jacky Li] fix according to comments
3bb74f6 [Jacky Li] make FPGrowth exampl app take parameters
f0e4d10 [Jacky Li] make FPGrowth exampl app take parameters
2015-02-23 08:47:28 -08:00
Joseph K. Bradley 4a17eedb16 [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release
For SPARK-5867:
* The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API.
* It should also include Python examples now.

For SPARK-5892:
* Fix Python docs
* Various other cleanups

BTW, I accidentally merged this with master.  If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check]

CC: mengxr  (ML),  davies  (Python docs)

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits:

f191bb0 [Joseph K. Bradley] small cleanups
e786efa [Joseph K. Bradley] small doc corrections
6b1ab4a [Joseph K. Bradley] fixed python lint test
946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example.  Changed spark.ml Java examples to use DataFrames API instead of sql()
da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3
629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python
b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation
34b067f [Joseph K. Bradley] small doc correction
da16aef [Joseph K. Bradley] Fixed python mllib docs
8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs
a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
b05a80d [Joseph K. Bradley] organize imports. doc cleanups
e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
2015-02-20 02:31:32 -08:00
martinzapletal 61eb12674b [MLLIB][SPARK-5502] User guide for isotonic regression
User guide for isotonic regression added to docs/mllib-regression.md including code examples for Scala and Java.

Author: martinzapletal <zapletal-martin@email.cz>

Closes #4536 from zapletal-martin/SPARK-5502 and squashes the following commits:

67fe773 [martinzapletal] SPARK-5502 reworded model prediction rules to use more general language rather than the code/implementation specific terms
80bd4c3 [martinzapletal] SPARK-5502 created docs page for isotonic regression, added links to the page, updated data and examples
7d8136e [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
504b5c3 [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
2015-02-15 09:10:03 -08:00
Xiangrui Meng 855d12ac0a [SPARK-5539][MLLIB] LDA guide
This is the LDA user guide from jkbradley with Java and Scala code example.

Author: Xiangrui Meng <meng@databricks.com>
Author: Joseph K. Bradley <joseph@databricks.com>

Closes #4465 from mengxr/lda-guide and squashes the following commits:

6dcb7d1 [Xiangrui Meng] update java example in the user guide
76169ff [Xiangrui Meng] update java example
36c3ae2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into lda-guide
c2a1efe [Joseph K. Bradley] Added LDA programming guide, plus Java example (which is in the guide and probably should be removed).
2015-02-08 23:40:36 -08:00
Travis Galoppo 9ad56ad2a2 [SPARK-5013] [MLlib] Added documentation and sample data file for GaussianMixture
Simple description and code samples (and sample data) for GaussianMixture

Author: Travis Galoppo <tjg2107@columbia.edu>

Closes #4401 from tgaloppo/spark-5013 and squashes the following commits:

c9ff9a5 [Travis Galoppo] Fixed link in mllib-clustering.md Added Gaussian mixture and power iteration as available clustering techniques in mllib-guide
2368690 [Travis Galoppo] Minor fixes
3eb41fa [Travis Galoppo] [SPARK-5013] Added documentation and sample data file for GaussianMixture
2015-02-06 10:26:51 -08:00
Sean Owen 635888cbed SPARK-2363. Clean MLlib's sample data files
(Just made a PR for this, mengxr was the reporter of:)

MLlib has sample data under serveral folders:
1) data/mllib
2) data/
3) mllib/data/*
Per previous discussion with Matei Zaharia, we want to put them under `data/mllib` and clean outdated files.

Author: Sean Owen <sowen@cloudera.com>

Closes #1394 from srowen/SPARK-2363 and squashes the following commits:

54313dd [Sean Owen] Move ML example data from /mllib/data/ and /data/ into /data/mllib/
2014-07-13 19:27:43 -07:00
Xiangrui Meng bcb9dce6f4 [SPARK-1874][MLLIB] Clean up MLlib sample data
1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`.
2. Embedded instructions in the help message of those example apps.

Per discussion with Matei on the JIRA page, new example data is under `data/mllib`.

Author: Xiangrui Meng <meng@databricks.com>

Closes #833 from mengxr/mllib-sample-data and squashes the following commits:

59f0a18 [Xiangrui Meng] add sample binary classification data
3c2f92f [Xiangrui Meng] add linear regression data
050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example
2014-05-19 21:29:33 -07:00