Commit graph

15 commits

Author SHA1 Message Date
Sean Owen 754f820035 [SPARK-26918][DOCS] All .md should have ASF license header
## What changes were proposed in this pull request?

Add AL2 license to metadata of all .md files.
This seemed to be the tidiest way as it will get ignored by .md renderers and other tools. Attempts to write them as markdown comments revealed that there is no such standard thing.

## How was this patch tested?

Doc build

Closes #24243 from srowen/SPARK-26918.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-30 19:49:45 -05:00
Katrin Leinweber c5daccb1da [MINOR] Update all DOI links to preferred resolver
## What changes were proposed in this pull request?

The DOI foundation recommends [this new resolver](https://www.doi.org/doi_handbook/3_Resolution.html#3.8). Accordingly, this PR re`sed`s all static DOI links ;-)

## How was this patch tested?

It wasn't, since it seems as safe as a "[typo fix](https://spark.apache.org/contributing.html)".

In case any of the files is included from other projects, and should be updated there, please let me know.

Closes #23129 from katrinleinweber/resolve-DOIs-securely.

Authored-by: Katrin Leinweber <9948149+katrinleinweber@users.noreply.github.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2018-11-25 17:43:55 -06:00
Jirka Kremser 5955a2d0fb [MINOR][DOCS] s/It take/It takes/g
## What changes were proposed in this pull request?

Fixing three small typos in the docs, in particular:

It take a `RDD` -> It takes an `RDD` (twice)

It take an `JavaRDD` -> It takes a `JavaRDD`

I didn't create any Jira issue for this minor thing, I hope it's ok.

## How was this patch tested?

visually by clicking on 'preview'

Author: Jirka Kremser <jkremser@redhat.com>

Closes #20108 from Jiri-Kremser/docs-typo.
2017-12-31 15:38:10 -06:00
Yuhao Yang add9d1bba5 [SPARK-19791][ML] Add doc and example for fpgrowth
## What changes were proposed in this pull request?

Add a new section for fpm
Add Example for FPGrowth in scala and Java

updated: Rewrite transform to be more compact.

## How was this patch tested?

local doc generation.

Author: Yuhao Yang <yuhao.yang@intel.com>

Closes #17130 from hhbyyh/fpmdoc.
2017-04-29 10:51:45 -07:00
Joseph K. Bradley 5ffd5d3838 [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide
## What changes were proposed in this pull request?

Made DataFrame-based API primary
* Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html
* mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html
* ml-guide.html includes a "maintenance mode" announcement about the RDD-based API
  * **Reviewers: please check this carefully**
* (minor) Titles for DF API no longer include "- spark.ml" suffix.  Titles for RDD API have "- RDD-based API" suffix
* Moved migration guide to ml-guide from mllib-guide
  * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides
  * **Reviewers**: I did not change any of the content of the migration guides.

Reorganized DataFrame-based guide:
* ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc.
* Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html
  * **Reviewers**: I did not change the content of these guides, except some intro text.
* Sidebar remains the same, but with pipeline and tuning sections added

Other:
* ml-classification-regression.html: Moved text about linear methods to new section in page

## How was this patch tested?

Generated docs locally

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #14213 from jkbradley/ml-guide-2.0.
2016-07-15 13:38:23 -07:00
Dongjoon Hyun 024482bf51 [MINOR][DOCS] Fix all typos in markdown files of doc and similar patterns in other comments
## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.
2016-02-22 09:52:07 +00:00
Timothy Hunter 2ecbe02d5b [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation.
Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).

It also removes some files that I forgot to delete with #10207

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10234 from thunterdb/12212.
2015-12-10 12:50:46 -08:00
Pravin Gadakh 820064e613 [SPARK-11380][DOCS] Replace example code in mllib-frequent-pattern-mining.md using include_example
Author: Pravin Gadakh <pravingadakh177@gmail.com>
Author: Pravin Gadakh <prgadakh@in.ibm.com>

Closes #9340 from pravingadakh/SPARK-11380.
2015-11-04 08:32:08 -08:00
Xin Ren 27cdde2ff8 [SPARK-10669] [DOCS] Link to each language's API in codetabs in ML docs: spark.mllib
In the Markdown docs for the spark.mllib Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "ChiSqSelector" section in 64743870f2/docs/mllib-feature-extraction.md
This JIRA is just for spark.mllib, not spark.ml.

Please let me know if more work is needed, thanks a lot.

Author: Xin Ren <iamshrek@126.com>

Closes #8977 from keypointt/SPARK-10669.
2015-10-07 15:00:19 +01:00
Yanbo Liang 802b5b8791 [SPARK-10084] [MLLIB] [DOC] Add Python example for mllib FP-growth user guide
1, Add Python example for mllib FP-growth user guide.
2, Correct mistakes of Scala and Java examples.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8279 from yanboliang/spark-10084.
2015-08-19 08:53:34 -07:00
Feynman Liang f5ea391290 [SPARK-9900] [MLLIB] User guide for Association Rules
Updates FPM user guide to include Association Rules.

Author: Feynman Liang <fliang@databricks.com>

Closes #8207 from feynmanliang/SPARK-9900-arules.
2015-08-18 12:53:57 -07:00
Feynman Liang 0b6b017613 [SPARK-9898] [MLLIB] Prefix Span user guide
Adds user guide for `PrefixSpan`, including Scala and Java example code.

mengxr zhangjiajin

Author: Feynman Liang <fliang@databricks.com>

Closes #8253 from feynmanliang/SPARK-9898.
2015-08-17 17:53:24 -07:00
FavioVazquez 490d5a72ec [SPARK-8274] [DOCUMENTATION-MLLIB] Fix wrong URLs in MLlib Frequent Pattern Mining Documentation
There is a mistake in the URLs of the Scala section of FP-Growth in the MLlib Frequent Pattern Mining documentation. The URL points to https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/fpm/FPGrowth.html which is the Java's API, the link should point to the Scala API https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.fpm.FPGrowth

There's another mistake in the FP-GrowthModel in the same section, the link points, again, to the Java's API https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/fpm/FPGrowthModel.html, the link should point to the Scala API https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.fpm.FPGrowthModel

Author: FavioVazquez <favio.vazquezp@gmail.com>

Closes #6722 from FavioVazquez/fix-wrog-urls-mllib-fpgrowth and squashes the following commits:

e1ca54d [FavioVazquez] - Fixed wrong URLs in MLlib Frequent Pattern Mining, FP-Growth Scala section
ad882a3 [FavioVazquez] Merge remote-tracking branch 'upstream/master'
f27a20b [FavioVazquez] Merge remote-tracking branch 'upstream/master'
9af7074 [FavioVazquez] Merge remote-tracking branch 'upstream/master'
edab1ef [FavioVazquez] Merge remote-tracking branch 'upstream/master'
b2e2f8c [FavioVazquez] Merge remote-tracking branch 'upstream/master'
2015-06-09 15:02:18 +01:00
Xiangrui Meng 0cfd2cebde [SPARK-5900][MLLIB] make PIC and FPGrowth Java-friendly
In the previous version, PIC stores clustering assignments as an `RDD[(Long, Int)]`. This is mapped to `RDD<Tuple2<Object, Object>>` in Java and hence Java users have to cast types manually. We should either create a new method called `javaAssignments` that returns `JavaRDD[(java.lang.Long, java.lang.Int)]` or wrap the result pair in a class. I chose the latter approach in this PR. Now assignments are stored as an `RDD[Assignment]`, where `Assignment` is a class with `id` and `cluster`.

Similarly, in FPGrowth, the frequent itemsets are stored as an `RDD[(Array[Item], Long)]`, which is mapped to `RDD<Tuple2<Object, Object>>`. Though we provide a "Java-friendly" method `javaFreqItemsets` that returns `JavaRDD[(Array[Item], java.lang.Long)]`. It doesn't really work because `Array[Item]` is mapped to `Object` in Java. So in this PR I created a class `FreqItemset` to wrap the results. It has `items` and `freq`, as well as a `javaItems` method that returns `List<Item>` in Java.

I'm not certain that the names I chose are proper: `Assignment`/`id`/`cluster` and `FreqItemset`/`items`/`freq`. Please let me know if there are better suggestions.

CC: jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #4695 from mengxr/SPARK-5900 and squashes the following commits:

865b5ca [Xiangrui Meng] make Assignment serializable
cffa96e [Xiangrui Meng] fix test
9c0e590 [Xiangrui Meng] remove unused Tuple2
1b9db3d [Xiangrui Meng] make PIC and FPGrowth Java-friendly
2015-02-19 18:06:16 -08:00
Xiangrui Meng 85e9d091d5 [SPARK-5519][MLLIB] add user guide with example code for fp-growth
The API is still not very Java-friendly because `Array[Item]` in `freqItemsets` is recognized as `Object` in Java. We might want to define a case class to wrap the return pair to make it Java friendly.

Author: Xiangrui Meng <meng@databricks.com>

Closes #4661 from mengxr/SPARK-5519 and squashes the following commits:

58ccc25 [Xiangrui Meng] add user guide with example code for fp-growth
2015-02-18 10:09:56 -08:00