Commit graph

1605 commits

Author SHA1 Message Date
Reynold Xin ffcb6e055a [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT
## What changes were proposed in this pull request?
After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.

## How was this patch tested?
N/A

Author: Reynold Xin <rxin@databricks.com>

Closes #14130 from rxin/SPARK-16477.
2016-07-11 09:42:56 -07:00
Xin Ren 9cb1eb7af7 [SPARK-16381][SQL][SPARKR] Update SQL examples and programming guide for R language binding
https://issues.apache.org/jira/browse/SPARK-16381

## What changes were proposed in this pull request?

Update SQL examples and programming guide for R language binding.

Here I just follow example https://github.com/apache/spark/compare/master...liancheng:example-snippet-extraction, created a separate R file to store all the example code.

## How was this patch tested?

Manual test on my local machine.
Screenshot as below:

![screen shot 2016-07-06 at 4 52 25 pm](https://cloud.githubusercontent.com/assets/3925641/16638180/13925a58-439a-11e6-8d57-8451a63dcae9.png)

Author: Xin Ren <iamshrek@126.com>

Closes #14082 from keypointt/SPARK-16381.
2016-07-11 20:05:28 +08:00
Michael Gummelt b1db26acc5 [SPARK-11857][MESOS] Deprecate fine grained
## What changes were proposed in this pull request?

Documentation changes to indicate that fine-grained mode is now deprecated.  No code changes were made, and all fine-grained mode instructions were left in place.  We can remove all of that once the deprecation cycle completes (Does Spark have a standard deprecation cycle?  One major version?)

Blocked on https://github.com/apache/spark/pull/14059

## How was this patch tested?

Viewed in Github

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #14078 from mgummelt/deprecate-fine-grained.
2016-07-08 20:20:26 -07:00
Michael Gummelt 9c041990cf [MESOS] expand coarse-grained mode docs
## What changes were proposed in this pull request?

docs

## How was this patch tested?

viewed the docs in github

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #14059 from mgummelt/coarse-grained.
2016-07-06 15:02:45 -07:00
WeichenXu b1310425b3 [DOC][SQL] update out-of-date code snippets using SQLContext in all documents.
## What changes were proposed in this pull request?

I search the whole documents directory using SQLContext, and update the following places:

- docs/configuration.md, sparkR code snippets.
- docs/streaming-programming-guide.md, several example code.

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #14025 from WeichenXu123/WIP_SQLContext_update.
2016-07-06 10:41:48 -07:00
Sean Owen 18fb57f58a [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good measure
## What changes were proposed in this pull request?

Coincidentally, I discovered that a couple images were unused in `docs/`, and then searched and found more, and then realized some PNGs were pretty big and could be crushed, and before I knew it, had done the same for the ASF site (not committed yet).

No functional change at all, just less superfluous image data.

## How was this patch tested?

`jekyll serve`

Author: Sean Owen <sowen@cloudera.com>

Closes #14029 from srowen/RemoveCompressImages.
2016-07-04 09:21:58 +01:00
WeichenXu 0bd7cd18bc [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide example snippets from source files instead of hard code them
## What changes were proposed in this pull request?

I extract 6 example programs from GraphX programming guide and replace them with
`include_example` label.

The 6 example programs are:
- AggregateMessagesExample.scala
- SSSPExample.scala
- TriangleCountingExample.scala
- ConnectedComponentsExample.scala
- ComprehensiveExample.scala
- PageRankExample.scala

All the example code can run using
`bin/run-example graphx.EXAMPLE_NAME`

## How was this patch tested?

Manual.

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #14015 from WeichenXu123/graphx_example_plugin.
2016-07-02 16:29:00 +01:00
WeichenXu 192d1f9cf3 [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document
## What changes were proposed in this pull request?

There are two test data files used for graphx examples existing in directory "graphx/data"
I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there.

I also update the graphx document where reference the data files which I move place.

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #14010 from WeichenXu123/move_graphx_data_dir.
2016-07-02 08:40:23 +01:00
Nick Pentreath 4a981dc870 [SPARK-15643][DOC][ML] Add breaking changes to ML migration guide
This PR adds the breaking changes from [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810) to the migration guide.

## How was this patch tested?

Built docs locally.

Author: Nick Pentreath <nickp@za.ibm.com>

Closes #13924 from MLnick/SPARK-15643-migration-guide.
2016-06-30 17:55:14 -07:00
Tathagata Das 5d00a7bc19 [SPARK-16256][DOCS] Fix window operation diagram
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #14001 from tdas/SPARK-16256-2.
2016-06-30 14:01:34 -07:00
Tathagata Das 2c3d96134d [SPARK-16256][DOCS] Minor fixes on the Structured Streaming Programming Guide
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #13978 from tdas/SPARK-16256-1.
2016-06-29 23:38:19 -07:00
Cheng Lian bde1d6a615 [SPARK-16294][SQL] Labelling support for the include_example Jekyll plugin
## What changes were proposed in this pull request?

This PR adds labelling support for the `include_example` Jekyll plugin, so that we may split a single source file into multiple line blocks with different labels, and include them in multiple code snippets in the generated HTML page.

## How was this patch tested?

Manually tested.

<img width="923" alt="screenshot at jun 29 19-53-21" src="https://cloud.githubusercontent.com/assets/230655/16451099/66a76db2-3e33-11e6-84fb-63104c2f0688.png">

Author: Cheng Lian <lian@databricks.com>

Closes #13972 from liancheng/include-example-with-labels.
2016-06-29 22:50:53 -07:00
Tathagata Das 64132a14fb [SPARK-16256][SQL][STREAMING] Added Structured Streaming Programming Guide
Title defines all.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #13945 from tdas/SPARK-16256.
2016-06-29 11:45:57 -07:00
jerryshao 272a2f78f3 [SPARK-15990][YARN] Add rolling log aggregation support for Spark on yarn
## What changes were proposed in this pull request?

Yarn supports rolling log aggregation since 2.6, previously log will only be aggregated to HDFS after application is finished, it is quite painful for long running applications like Spark Streaming, thriftserver. Also out of disk problem will be occurred when log file is too large. So here propose to add support of rolling log aggregation for Spark on yarn.

One limitation for this is that log4j should be set to change to file appender, now in Spark itself uses console appender by default, in which file will not be created again once removed after aggregation. But I think lots of production users should have changed their log4j configuration instead of default on, so this is not a big problem.

## How was this patch tested?

Manually verified with Hadoop 2.7.1.

Author: jerryshao <sshao@hortonworks.com>

Closes #13712 from jerryshao/SPARK-15990.
2016-06-29 08:17:27 -05:00
Yanbo Liang 26252f7064 [SPARK-15643][DOC][ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0
## What changes were proposed in this pull request?
Update ```spark.ml``` and ```spark.mllib``` migration guide from 1.6 to 2.0.

## How was this patch tested?
Docs update, no tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #13378 from yanboliang/spark-13448.
2016-06-28 11:54:25 -07:00
Yin Huai dd6b7dbe70 [SPARK-15863][SQL][DOC][FOLLOW-UP] Update SQL programming guide.
## What changes were proposed in this pull request?
This PR makes several updates to SQL programming guide.

Author: Yin Huai <yhuai@databricks.com>

Closes #13938 from yhuai/doc.
2016-06-27 22:44:08 -07:00
GayathriMurali be88383e15 [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer
## What changes were proposed in this pull request?

Made changes to HashingTF,QuantileVectorizer and CountVectorizer

Author: GayathriMurali <gayathri.m@intel.com>

Closes #13745 from GayathriMurali/SPARK-15997.
2016-06-24 13:25:40 +02:00
Ryan Blue 738f134bf4 [SPARK-13723][YARN] Change behavior of --num-executors with dynamic allocation.
## What changes were proposed in this pull request?

This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors.

This changes was discussed on [SPARK-13723](https://issues.apache.org/jira/browse/SPARK-13723). I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message.

## How was this patch tested?

This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors.

Author: Ryan Blue <blue@apache.org>

Closes #13338 from rdblue/SPARK-13723-num-executors-with-dynamic-allocation.
2016-06-23 14:03:46 -05:00
Felix Cheung b5a997667f [SPARK-16088][SPARKR] update setJobGroup, cancelJobGroup, clearJobGroup
## What changes were proposed in this pull request?

Updated setJobGroup, cancelJobGroup, clearJobGroup to not require sc/SparkContext as parameter.
Also updated roxygen2 doc and R programming guide on deprecations.

## How was this patch tested?

unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13838 from felixcheung/rjobgroup.
2016-06-23 09:45:01 -07:00
Kai Jiang 43b04b7ecb [SPARK-15672][R][DOC] R programming guide update
## What changes were proposed in this pull request?
Guide for
- UDFs with dapply, dapplyCollect
- spark.lapply for running parallel R functions

## How was this patch tested?
build locally
<img width="654" alt="screen shot 2016-06-14 at 03 12 56" src="https://cloud.githubusercontent.com/assets/3419881/16039344/12a3b6a0-31de-11e6-8d77-fe23308075c0.png">

Author: Kai Jiang <jiangkai@gmail.com>

Closes #13660 from vectorijk/spark-15672-R-guide-update.
2016-06-22 12:50:36 -07:00
Felix Cheung 79aa1d82ca [SQL][DOC] SQL programming guide add deprecated methods in 2.0.0
## What changes were proposed in this pull request?

Doc changes

## How was this patch tested?

manual

liancheng

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13827 from felixcheung/sqldocdeprecate.
2016-06-22 10:37:13 +08:00
Yuhao Yang a58f402394 [SPARK-16045][ML][DOC] Spark 2.0 ML.feature: doc update for stopwords and binarizer
## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-16045
2.0 Audit: Update document for StopWordsRemover and Binarizer.

## How was this patch tested?

manual review for doc

Author: Yuhao Yang <hhbyyh@gmail.com>
Author: Yuhao Yang <yuhao.yang@intel.com>

Closes #13375 from hhbyyh/stopdoc.
2016-06-21 00:47:36 -07:00
Takeshi YAMAMURO 41e0ffb19f [SPARK-15894][SQL][DOC] Update docs for controlling #partitions
## What changes were proposed in this pull request?
Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration Options.

## How was this patch tested?
N/A

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #13797 from maropu/SPARK-15894-2.
2016-06-21 14:27:16 +08:00
Felix Cheung 58f6e27dd7 [SPARK-15863][SQL][DOC][SPARKR] sql programming guide updates to include sparkSession in R
## What changes were proposed in this pull request?

Update doc as per discussion in PR #13592

## How was this patch tested?

manual

shivaram liancheng

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13799 from felixcheung/rsqlprogrammingguide.
2016-06-21 13:56:37 +08:00
Eric Liang 07367533de [SPARK-16025][CORE] Document OFF_HEAP storage level in 2.0
This has changed from 1.6, and now stores memory off-heap using spark's off-heap support instead of in tachyon.

Author: Eric Liang <ekl@databricks.com>

Closes #13744 from ericl/spark-16025.
2016-06-20 21:56:44 -07:00
Cheng Lian 6df8e38860 [SPARK-15863][SQL][DOC] Initial SQL programming guide update for Spark 2.0
## What changes were proposed in this pull request?

Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0 migration guide are still incomplete.

We may also want to add more examples for Scala/Java Dataset typed transformations.

## How was this patch tested?

N/A

Author: Cheng Lian <lian@databricks.com>

Closes #13592 from liancheng/sql-programming-guide-2.0.
2016-06-20 14:50:28 -07:00
Felix Cheung 359c2e827d [SPARK-15159][SPARKR] SparkSession roxygen2 doc, programming guide, example updates
## What changes were proposed in this pull request?

roxygen2 doc, programming guide, example updates

## How was this patch tested?

manual checks
shivaram

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13751 from felixcheung/rsparksessiondoc.
2016-06-20 13:46:24 -07:00
wm624@hotmail.com 5930d7a2e9 [SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece
## What changes were proposed in this pull request?

In the 2.0 document, Line "A full example that produces the experiment described in the PIC paper can be found under examples/." is redundant.

There is already "Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala" in the Spark repo.".

We should remove the first line, which is consistent with other documents.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

Manual test

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #13755 from wangmiao1981/doc.
2016-06-19 20:19:40 +01:00
GayathriMurali af2a4b0826 [SPARK-15129][R][DOC] R API changes in ML
## What changes were proposed in this pull request?

Make user guide changes to SparkR documentation for all changes that happened in 2.0 to Machine Learning APIs

Author: GayathriMurali <gayathri.m@intel.com>

Closes #13285 from GayathriMurali/SPARK-15129.
2016-06-17 21:10:29 -07:00
Dhruve Ashar f1bf0d2f3a [SPARK-15966][DOC] Add closing tag to fix rendering issue for Spark monitoring
## What changes were proposed in this pull request?
Adds the missing closing tag for spark.ui.view.acls.groups

## How was this patch tested?
I built the docs locally and verified the changed in browser.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
**Before:**
![image](https://cloud.githubusercontent.com/assets/7732317/16135005/49fc0724-33e6-11e6-9390-98711593fa5b.png)

**After:**
![image](https://cloud.githubusercontent.com/assets/7732317/16135021/62b5c4a8-33e6-11e6-8118-b22fda5c66eb.png)

Author: Dhruve Ashar <dhruveashar@gmail.com>

Closes #13719 from dhruve/doc/SPARK-15966.
2016-06-16 17:46:19 -07:00
WeichenXu 9040d83bc2 [SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic regression
## What changes were proposed in this pull request?

add ml doc for ml isotonic regression
add scala example for ml isotonic regression
add java example for ml isotonic regression
add python example for ml isotonic regression

modify scala example for mllib isotonic regression
modify java example for mllib isotonic regression
modify python example for mllib isotonic regression

add data/mllib/sample_isotonic_regression_libsvm_data.txt
delete data/mllib/sample_isotonic_regression_data.txt
## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #13381 from WeichenXu123/add_isotonic_regression_doc.
2016-06-16 17:35:40 -07:00
Sean Owen 457126e420 [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config
## What changes were proposed in this pull request?

Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within default JVM old generation size (2/3 heap). See JIRA discussion. This means a full cache doesn't spill into the new gen. CC andrewor14

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #13618 from srowen/SPARK-15796.
2016-06-16 23:04:10 +02:00
Nirman Narang 04d7b3d2b6 [SPARK-7848][STREAMING][UPDATE SPARKSTREAMING DOCS TO INCORPORATE IMPORTANT POINTS.]
Updated the SparkStreaming Doc with some important points.

Author: Nirman Narang <narang@us.ibm.com>

Closes #11114 from nirmannarang/SPARK-7848.
2016-06-15 15:36:31 -07:00
Mortada Mehyar a87a56f5c7 [DOCUMENTATION] fixed typos in python programming guide
## What changes were proposed in this pull request?

minor typo

## How was this patch tested?

minor typo in the doc, should be self explanatory

Author: Mortada Mehyar <mortada.mehyar@gmail.com>

Closes #13639 from mortada/typo.
2016-06-14 09:45:46 +01:00
Sean Owen f51dfe616b [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API
## What changes were proposed in this pull request?

- Deprecate old Java accumulator API; should use Scala now
- Update Java tests and examples
- Don't bother testing old accumulator API in Java 8 (too)
- (fix a misspelling too)

## How was this patch tested?

Jenkins tests

Author: Sean Owen <sowen@cloudera.com>

Closes #13606 from srowen/SPARK-15086.
2016-06-12 11:44:33 -07:00
bomeng 50248dcfff [SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP
## What changes were proposed in this pull request?

SPARK_MASTER_IP is a deprecated environment variable. It is replaced by SPARK_MASTER_HOST according to MasterArguments.scala.

## How was this patch tested?

Manually verified.

Author: bomeng <bmeng@us.ibm.com>

Closes #13543 from bomeng/SPARK-15806.
2016-06-12 14:25:48 +01:00
bomeng 3fd3ee038b [SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc
## What changes were proposed in this pull request?

Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document for `SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are actually used, SparkConf will show a warning message as before.

## How was this patch tested?

Manually tested.

Author: bomeng <bmeng@us.ibm.com>

Closes #13533 from bomeng/SPARK-15781.
2016-06-12 12:58:34 +01:00
Dongjoon Hyun ad102af169 [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents
## What changes were proposed in this pull request?

This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change.

**Fix broken links**
  * mllib-data-types.md
  * mllib-decision-tree.md
  * mllib-ensembles.md
  * mllib-feature-extraction.md
  * mllib-pmml-model-export.md
  * mllib-statistics.md

**Fix malformed section header and scala coding style**
  * mllib-linear-methods.md

**Replace indirect forward links with direct one**
  * ml-classification-regression.md

## How was this patch tested?

Manual tests (with `cd docs; jekyll build`.)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #13608 from dongjoon-hyun/SPARK-15883.
2016-06-11 12:55:38 +01:00
Sean Owen 3761330dd0 [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"
## What changes were proposed in this pull request?

Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old unreferenced logo files.

## How was this patch tested?

Manual check of generated HTML site and Spark UI. I searched for references to the deleted files to make sure they were not used.

Author: Sean Owen <sowen@cloudera.com>

Closes #13609 from srowen/SPARK-15879.
2016-06-11 12:46:07 +01:00
Mortada Mehyar 675a73715d [DOCUMENTATION] fixed groupby aggregation example for pyspark
## What changes were proposed in this pull request?

fixing documentation for the groupby/agg example in python

## How was this patch tested?

the existing example in the documentation dose not contain valid syntax (missing parenthesis) and is not using `Column` in the expression for `agg()`

after the fix here's how I tested it:

```
In [1]: from pyspark.sql import Row

In [2]: import pyspark.sql.functions as func

In [3]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:records = [{'age': 19, 'department': 1, 'expense': 100},
: {'age': 20, 'department': 1, 'expense': 200},
: {'age': 21, 'department': 2, 'expense': 300},
: {'age': 22, 'department': 2, 'expense': 300},
: {'age': 23, 'department': 3, 'expense': 300}]
:--

In [4]: df = sqlContext.createDataFrame([Row(**d) for d in records])

In [5]: df.groupBy("department").agg(df["department"], func.max("age"), func.sum("expense")).show()

+----------+----------+--------+------------+
|department|department|max(age)|sum(expense)|
+----------+----------+--------+------------+
|         1|         1|      20|         300|
|         2|         2|      22|         600|
|         3|         3|      23|         300|
+----------+----------+--------+------------+

Author: Mortada Mehyar <mortada.mehyar@gmail.com>

Closes #13587 from mortada/groupby_agg_doc_fix.
2016-06-10 00:23:34 -07:00
prabs ca70ab27cc [DOCUMENTATION] Fixed target JAR path
## What changes were proposed in this pull request?

Mentioned Scala version in the sbt configuration file is 2.11, so the path of the target JAR should be `/target/scala-2.11/simple-project_2.11-1.0.jar`

## How was this patch tested?

n/a

Author: prabs <prabsmails@gmail.com>
Author: Prabeesh K <prabsmails@gmail.com>

Closes #13554 from prabeesh/master.
2016-06-08 17:22:55 +01:00
Yanbo Liang 6ecedf39b4 [SPARK-13590][ML][DOC] Document spark.ml LiR, LoR and AFTSurvivalRegression behavior difference
## What changes were proposed in this pull request?
When fitting ```LinearRegressionModel```(by "l-bfgs" solver) and ```LogisticRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce same model as R glmnet but different from LIBSVM.

When fitting ```AFTSurvivalRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce different model compared with R survival::survreg.

We should output a warning message and clarify in document for this condition.

## How was this patch tested?
Document change, no unit test.

cc mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12731 from yanboliang/spark-13590.
2016-06-07 15:25:36 -07:00
Marcelo Vanzin 200f01c8fb [SPARK-15760][DOCS] Add documentation for package-related configs.
While there, also document spark.files and spark.jars. Text is the
same as the spark-submit help text with some minor adjustments.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #13502 from vanzin/SPARK-15760.
2016-06-07 09:28:39 -07:00
WeichenXu 1e2c931187 [MINOR] fix typo in documents
## What changes were proposed in this pull request?

I use spell check tools checks typo in spark documents and fix them.

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #13538 from WeichenXu123/fix_doc_typo.
2016-06-07 13:29:27 +01:00
Ruifeng Zheng 2099e05f93 [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score
## What changes were proposed in this pull request?
1, del precision,recall in  `ml.MulticlassClassificationEvaluator`
2, update user guide for `mlllib.weightedFMeasure`

## How was this patch tested?
local build

Author: Ruifeng Zheng <ruifengz@foxmail.com>

Closes #13390 from zhengruifeng/clarify_f1.
2016-06-04 13:56:04 +01:00
Liwei Lin a0eec8e8ff [SPARK-15208][WIP][CORE][STREAMING][DOCS] Update Spark examples with AccumulatorV2
## What changes were proposed in this pull request?

The patch updates the codes & docs in the example module as well as the related doc module:

- [ ] [docs] `streaming-programming-guide.md`
  - [x] scala code part
  - [ ] java code part
  - [ ] python code part
- [x] [examples] `RecoverableNetworkWordCount.scala`
- [ ] [examples] `JavaRecoverableNetworkWordCount.java`
- [ ] [examples] `recoverable_network_wordcount.py`

## How was this patch tested?

Ran the examples and verified results manually.

Author: Liwei Lin <lwlin7@gmail.com>

Closes #12981 from lw-lin/accumulatorV2-examples.
2016-06-02 11:07:15 -05:00
WeichenXu 2402b91461 [SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator section
## What changes were proposed in this pull request?

Update document programming-guide accumulator section (scala language)
java and python version, because the API haven't done, so I do not modify them.

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #13441 from WeichenXu123/update_doc_accumulatorV2_clean.
2016-06-01 12:57:02 -07:00
Matthew Wise 2d34183b27 [DOCS] fix example code issues in documentation
## What changes were proposed in this pull request?

Fixed broken java code examples in streaming documentation

Attn: tdas

Author: Matthew Wise <matthew.rs.wise@gmail.com>

Closes #13388 from mawise/fix_docs_java_streaming_example.
2016-05-30 09:12:02 -05:00
Yanbo Liang a3550e3747 [SPARK-11959][SPARK-15484][DOC][ML] Document WLS and IRLS
## What changes were proposed in this pull request?
* Document ```WeightedLeastSquares```(normal equation) and ```IterativelyReweightedLeastSquares```.
* Copy ```L-BFGS``` documents from ```spark.mllib``` to ```spark.ml```.

Due to the session ```Optimization of linear methods``` is used for developers, I think we should provide the brief introduction of the optimization method, necessary references and how it implements in Spark. It's not necessary to paste all mathematical formula and derivation here. If developers/users want to learn more, they can track reference.

## How was this patch tested?
Document update, no tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #13262 from yanboliang/spark-15484.
2016-05-27 13:16:22 -07:00
sethah c96244f5ac [SPARK-15186][ML][DOCS] Add user guide for generalized linear regression
## What changes were proposed in this pull request?

This patch adds a user guide section for generalized linear regression and includes the examples from [#12754](https://github.com/apache/spark/pull/12754).

## How was this patch tested?

Documentation only, no tests required.

## Approach

In general, it is a bit unclear what level of detail ought to be included in the user guide since there is a lot of variability within the current user guide. I tried to give a fairly brief mathematical introduction to GLMs, and cover what types of problems they could be used for. Additionally, I included a brief blurb on the IRLS solver. The input/output columns are given in a table as is found elsewhere in the docs (though, again, these appear rather intermittently in the current docs), as well as a table providing the supported families and their link functions.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #13139 from sethah/SPARK-15186.
2016-05-27 12:55:48 -07:00