Commit graph

1615 commits

Author SHA1 Message Date
Josh Rosen 972673aca5 [SPARK-16555] Work around Jekyll error-handling bug which led to silent failures
If a custom Jekyll template tag throws Ruby's equivalent of a "file not found" exception, then Jekyll will stop the doc building process but will exit with a successful status, causing our doc publishing jobs to silently fail.

This is caused by https://github.com/jekyll/jekyll/issues/5104, a case of bad error-handling logic in Jekyll. This patch works around this by updating our `include_example.rb` plugin to catch the exception and exit rather than allowing it to bubble up and be ignored by Jekyll.

I tested this manually with

```
rm ./examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLExample.scala
cd docs
SKIP_API=1 jekyll build
echo $?
```

Author: Josh Rosen <joshrosen@databricks.com>

Closes #14209 from JoshRosen/fix-doc-building.
2016-07-14 15:55:36 -07:00
Shivaram Venkataraman 01c4c1fa53 [SPARK-16553][DOCS] Fix SQL example file name in docs
## What changes were proposed in this pull request?

Fixes a typo in the sql programming guide

## How was this patch tested?

Building docs locally

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #14208 from shivaram/spark-sql-doc-fix.
2016-07-14 14:19:30 -07:00
Marcelo Vanzin b7b5e17876 [SPARK-16505][YARN] Optionally propagate error during shuffle service startup.
This prevents the NM from starting when something is wrong, which would
lead to later errors which are confusing and harder to debug.

Added a unit test to verify startup fails if something is wrong.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #14162 from vanzin/SPARK-16505.
2016-07-14 09:42:32 -05:00
Felix Cheung fb2e8eeb0b [SPARKR][DOCS][MINOR] R programming guide to include csv data source example
## What changes were proposed in this pull request?

Minor documentation update for code example, code style, and missed reference to "sparkR.init"

## How was this patch tested?

manual

shivaram

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #14178 from felixcheung/rcsvprogrammingguide.
2016-07-13 15:09:23 -07:00
James Thomas 51a6706b13 [SPARK-16114][SQL] updated structured streaming guide
## What changes were proposed in this pull request?

Updated structured streaming programming guide with new windowed example.

## How was this patch tested?

Docs

Author: James Thomas <jamesjoethomas@gmail.com>

Closes #14183 from jjthomas/ss_docs_update.
2016-07-13 13:26:23 -07:00
sandy bf107f1e65 [SPARK-16438] Add Asynchronous Actions documentation
## What changes were proposed in this pull request?

Add Asynchronous Actions documentation inside action of programming guide

## How was this patch tested?

check the documentation indentation and formatting with md preview.

Author: sandy <phalodi@gmail.com>

Closes #14104 from phalodi/SPARK-16438.
2016-07-13 11:33:46 +01:00
aokolnychyi 772c213ec7 [SPARK-16303][DOCS][EXAMPLES] Updated SQL programming guide and examples
- Hard-coded Spark SQL sample snippets were moved into source files under examples sub-project.
- Removed the inconsistency between Scala and Java Spark SQL examples
- Scala and Java Spark SQL examples were updated

The work is still in progress. All involved examples were tested manually. An additional round of testing will be done after the code review.

![image](https://cloud.githubusercontent.com/assets/6235869/16710314/51851606-462a-11e6-9fbe-0818daef65e4.png)

Author: aokolnychyi <okolnychyyanton@gmail.com>

Closes #14119 from aokolnychyi/spark_16303.
2016-07-13 16:12:11 +08:00
Lianhui Wang 5ad68ba5ce [SPARK-15752][SQL] Optimize metadata only query that has an aggregate whose children are deterministic project or filter operators.
## What changes were proposed in this pull request?
when query only use metadata (example: partition key), it can return results based on metadata without scanning files. Hive did it in HIVE-1003.

## How was this patch tested?
add unit tests

Author: Lianhui Wang <lianhuiwang09@gmail.com>
Author: Wenchen Fan <wenchen@databricks.com>
Author: Lianhui Wang <lianhuiwang@users.noreply.github.com>

Closes #13494 from lianhuiwang/metadata-only.
2016-07-12 18:52:15 +02:00
Xin Ren 05d7151ccb [MINOR][STREAMING][DOCS] Minor changes on kinesis integration
## What changes were proposed in this pull request?

Some minor changes for documentation page "Spark Streaming + Kinesis Integration".

Moved "streaming-kinesis-arch.png" before the bullet list, not in between the bullets.

## How was this patch tested?

Tested manually, on my local machine.

Author: Xin Ren <iamshrek@126.com>

Closes #14097 from keypointt/kinesisDoc.
2016-07-11 18:09:14 -07:00
Yanbo Liang 2ad031be67 [SPARKR][DOC] SparkR ML user guides update for 2.0
## What changes were proposed in this pull request?
* Update SparkR ML section to make them consistent with SparkR API docs.
* Since #13972 adds labelling support for the ```include_example``` Jekyll plugin, so that we can split the single ```ml.R``` example file into multiple line blocks with different labels, and include them in different algorithms/models in the generated HTML page.

## How was this patch tested?
Only docs update, manually check the generated docs.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #14011 from yanboliang/r-user-guide-update.
2016-07-11 14:31:11 -07:00
Reynold Xin ffcb6e055a [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT
## What changes were proposed in this pull request?
After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.

## How was this patch tested?
N/A

Author: Reynold Xin <rxin@databricks.com>

Closes #14130 from rxin/SPARK-16477.
2016-07-11 09:42:56 -07:00
Xin Ren 9cb1eb7af7 [SPARK-16381][SQL][SPARKR] Update SQL examples and programming guide for R language binding
https://issues.apache.org/jira/browse/SPARK-16381

## What changes were proposed in this pull request?

Update SQL examples and programming guide for R language binding.

Here I just follow example https://github.com/apache/spark/compare/master...liancheng:example-snippet-extraction, created a separate R file to store all the example code.

## How was this patch tested?

Manual test on my local machine.
Screenshot as below:

![screen shot 2016-07-06 at 4 52 25 pm](https://cloud.githubusercontent.com/assets/3925641/16638180/13925a58-439a-11e6-8d57-8451a63dcae9.png)

Author: Xin Ren <iamshrek@126.com>

Closes #14082 from keypointt/SPARK-16381.
2016-07-11 20:05:28 +08:00
Michael Gummelt b1db26acc5 [SPARK-11857][MESOS] Deprecate fine grained
## What changes were proposed in this pull request?

Documentation changes to indicate that fine-grained mode is now deprecated.  No code changes were made, and all fine-grained mode instructions were left in place.  We can remove all of that once the deprecation cycle completes (Does Spark have a standard deprecation cycle?  One major version?)

Blocked on https://github.com/apache/spark/pull/14059

## How was this patch tested?

Viewed in Github

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #14078 from mgummelt/deprecate-fine-grained.
2016-07-08 20:20:26 -07:00
Michael Gummelt 9c041990cf [MESOS] expand coarse-grained mode docs
## What changes were proposed in this pull request?

docs

## How was this patch tested?

viewed the docs in github

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #14059 from mgummelt/coarse-grained.
2016-07-06 15:02:45 -07:00
WeichenXu b1310425b3 [DOC][SQL] update out-of-date code snippets using SQLContext in all documents.
## What changes were proposed in this pull request?

I search the whole documents directory using SQLContext, and update the following places:

- docs/configuration.md, sparkR code snippets.
- docs/streaming-programming-guide.md, several example code.

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #14025 from WeichenXu123/WIP_SQLContext_update.
2016-07-06 10:41:48 -07:00
Sean Owen 18fb57f58a [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good measure
## What changes were proposed in this pull request?

Coincidentally, I discovered that a couple images were unused in `docs/`, and then searched and found more, and then realized some PNGs were pretty big and could be crushed, and before I knew it, had done the same for the ASF site (not committed yet).

No functional change at all, just less superfluous image data.

## How was this patch tested?

`jekyll serve`

Author: Sean Owen <sowen@cloudera.com>

Closes #14029 from srowen/RemoveCompressImages.
2016-07-04 09:21:58 +01:00
WeichenXu 0bd7cd18bc [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide example snippets from source files instead of hard code them
## What changes were proposed in this pull request?

I extract 6 example programs from GraphX programming guide and replace them with
`include_example` label.

The 6 example programs are:
- AggregateMessagesExample.scala
- SSSPExample.scala
- TriangleCountingExample.scala
- ConnectedComponentsExample.scala
- ComprehensiveExample.scala
- PageRankExample.scala

All the example code can run using
`bin/run-example graphx.EXAMPLE_NAME`

## How was this patch tested?

Manual.

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #14015 from WeichenXu123/graphx_example_plugin.
2016-07-02 16:29:00 +01:00
WeichenXu 192d1f9cf3 [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document
## What changes were proposed in this pull request?

There are two test data files used for graphx examples existing in directory "graphx/data"
I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there.

I also update the graphx document where reference the data files which I move place.

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #14010 from WeichenXu123/move_graphx_data_dir.
2016-07-02 08:40:23 +01:00
Nick Pentreath 4a981dc870 [SPARK-15643][DOC][ML] Add breaking changes to ML migration guide
This PR adds the breaking changes from [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810) to the migration guide.

## How was this patch tested?

Built docs locally.

Author: Nick Pentreath <nickp@za.ibm.com>

Closes #13924 from MLnick/SPARK-15643-migration-guide.
2016-06-30 17:55:14 -07:00
Tathagata Das 5d00a7bc19 [SPARK-16256][DOCS] Fix window operation diagram
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #14001 from tdas/SPARK-16256-2.
2016-06-30 14:01:34 -07:00
Tathagata Das 2c3d96134d [SPARK-16256][DOCS] Minor fixes on the Structured Streaming Programming Guide
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #13978 from tdas/SPARK-16256-1.
2016-06-29 23:38:19 -07:00
Cheng Lian bde1d6a615 [SPARK-16294][SQL] Labelling support for the include_example Jekyll plugin
## What changes were proposed in this pull request?

This PR adds labelling support for the `include_example` Jekyll plugin, so that we may split a single source file into multiple line blocks with different labels, and include them in multiple code snippets in the generated HTML page.

## How was this patch tested?

Manually tested.

<img width="923" alt="screenshot at jun 29 19-53-21" src="https://cloud.githubusercontent.com/assets/230655/16451099/66a76db2-3e33-11e6-84fb-63104c2f0688.png">

Author: Cheng Lian <lian@databricks.com>

Closes #13972 from liancheng/include-example-with-labels.
2016-06-29 22:50:53 -07:00
Tathagata Das 64132a14fb [SPARK-16256][SQL][STREAMING] Added Structured Streaming Programming Guide
Title defines all.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #13945 from tdas/SPARK-16256.
2016-06-29 11:45:57 -07:00
jerryshao 272a2f78f3 [SPARK-15990][YARN] Add rolling log aggregation support for Spark on yarn
## What changes were proposed in this pull request?

Yarn supports rolling log aggregation since 2.6, previously log will only be aggregated to HDFS after application is finished, it is quite painful for long running applications like Spark Streaming, thriftserver. Also out of disk problem will be occurred when log file is too large. So here propose to add support of rolling log aggregation for Spark on yarn.

One limitation for this is that log4j should be set to change to file appender, now in Spark itself uses console appender by default, in which file will not be created again once removed after aggregation. But I think lots of production users should have changed their log4j configuration instead of default on, so this is not a big problem.

## How was this patch tested?

Manually verified with Hadoop 2.7.1.

Author: jerryshao <sshao@hortonworks.com>

Closes #13712 from jerryshao/SPARK-15990.
2016-06-29 08:17:27 -05:00
Yanbo Liang 26252f7064 [SPARK-15643][DOC][ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0
## What changes were proposed in this pull request?
Update ```spark.ml``` and ```spark.mllib``` migration guide from 1.6 to 2.0.

## How was this patch tested?
Docs update, no tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #13378 from yanboliang/spark-13448.
2016-06-28 11:54:25 -07:00
Yin Huai dd6b7dbe70 [SPARK-15863][SQL][DOC][FOLLOW-UP] Update SQL programming guide.
## What changes were proposed in this pull request?
This PR makes several updates to SQL programming guide.

Author: Yin Huai <yhuai@databricks.com>

Closes #13938 from yhuai/doc.
2016-06-27 22:44:08 -07:00
GayathriMurali be88383e15 [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer and CountVectorizer
## What changes were proposed in this pull request?

Made changes to HashingTF,QuantileVectorizer and CountVectorizer

Author: GayathriMurali <gayathri.m@intel.com>

Closes #13745 from GayathriMurali/SPARK-15997.
2016-06-24 13:25:40 +02:00
Ryan Blue 738f134bf4 [SPARK-13723][YARN] Change behavior of --num-executors with dynamic allocation.
## What changes were proposed in this pull request?

This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors.

This changes was discussed on [SPARK-13723](https://issues.apache.org/jira/browse/SPARK-13723). I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message.

## How was this patch tested?

This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors.

Author: Ryan Blue <blue@apache.org>

Closes #13338 from rdblue/SPARK-13723-num-executors-with-dynamic-allocation.
2016-06-23 14:03:46 -05:00
Felix Cheung b5a997667f [SPARK-16088][SPARKR] update setJobGroup, cancelJobGroup, clearJobGroup
## What changes were proposed in this pull request?

Updated setJobGroup, cancelJobGroup, clearJobGroup to not require sc/SparkContext as parameter.
Also updated roxygen2 doc and R programming guide on deprecations.

## How was this patch tested?

unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13838 from felixcheung/rjobgroup.
2016-06-23 09:45:01 -07:00
Kai Jiang 43b04b7ecb [SPARK-15672][R][DOC] R programming guide update
## What changes were proposed in this pull request?
Guide for
- UDFs with dapply, dapplyCollect
- spark.lapply for running parallel R functions

## How was this patch tested?
build locally
<img width="654" alt="screen shot 2016-06-14 at 03 12 56" src="https://cloud.githubusercontent.com/assets/3419881/16039344/12a3b6a0-31de-11e6-8d77-fe23308075c0.png">

Author: Kai Jiang <jiangkai@gmail.com>

Closes #13660 from vectorijk/spark-15672-R-guide-update.
2016-06-22 12:50:36 -07:00
Felix Cheung 79aa1d82ca [SQL][DOC] SQL programming guide add deprecated methods in 2.0.0
## What changes were proposed in this pull request?

Doc changes

## How was this patch tested?

manual

liancheng

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13827 from felixcheung/sqldocdeprecate.
2016-06-22 10:37:13 +08:00
Yuhao Yang a58f402394 [SPARK-16045][ML][DOC] Spark 2.0 ML.feature: doc update for stopwords and binarizer
## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-16045
2.0 Audit: Update document for StopWordsRemover and Binarizer.

## How was this patch tested?

manual review for doc

Author: Yuhao Yang <hhbyyh@gmail.com>
Author: Yuhao Yang <yuhao.yang@intel.com>

Closes #13375 from hhbyyh/stopdoc.
2016-06-21 00:47:36 -07:00
Takeshi YAMAMURO 41e0ffb19f [SPARK-15894][SQL][DOC] Update docs for controlling #partitions
## What changes were proposed in this pull request?
Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration Options.

## How was this patch tested?
N/A

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #13797 from maropu/SPARK-15894-2.
2016-06-21 14:27:16 +08:00
Felix Cheung 58f6e27dd7 [SPARK-15863][SQL][DOC][SPARKR] sql programming guide updates to include sparkSession in R
## What changes were proposed in this pull request?

Update doc as per discussion in PR #13592

## How was this patch tested?

manual

shivaram liancheng

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13799 from felixcheung/rsqlprogrammingguide.
2016-06-21 13:56:37 +08:00
Eric Liang 07367533de [SPARK-16025][CORE] Document OFF_HEAP storage level in 2.0
This has changed from 1.6, and now stores memory off-heap using spark's off-heap support instead of in tachyon.

Author: Eric Liang <ekl@databricks.com>

Closes #13744 from ericl/spark-16025.
2016-06-20 21:56:44 -07:00
Cheng Lian 6df8e38860 [SPARK-15863][SQL][DOC] Initial SQL programming guide update for Spark 2.0
## What changes were proposed in this pull request?

Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0 migration guide are still incomplete.

We may also want to add more examples for Scala/Java Dataset typed transformations.

## How was this patch tested?

N/A

Author: Cheng Lian <lian@databricks.com>

Closes #13592 from liancheng/sql-programming-guide-2.0.
2016-06-20 14:50:28 -07:00
Felix Cheung 359c2e827d [SPARK-15159][SPARKR] SparkSession roxygen2 doc, programming guide, example updates
## What changes were proposed in this pull request?

roxygen2 doc, programming guide, example updates

## How was this patch tested?

manual checks
shivaram

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13751 from felixcheung/rsparksessiondoc.
2016-06-20 13:46:24 -07:00
wm624@hotmail.com 5930d7a2e9 [SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece
## What changes were proposed in this pull request?

In the 2.0 document, Line "A full example that produces the experiment described in the PIC paper can be found under examples/." is redundant.

There is already "Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala" in the Spark repo.".

We should remove the first line, which is consistent with other documents.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

Manual test

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #13755 from wangmiao1981/doc.
2016-06-19 20:19:40 +01:00
GayathriMurali af2a4b0826 [SPARK-15129][R][DOC] R API changes in ML
## What changes were proposed in this pull request?

Make user guide changes to SparkR documentation for all changes that happened in 2.0 to Machine Learning APIs

Author: GayathriMurali <gayathri.m@intel.com>

Closes #13285 from GayathriMurali/SPARK-15129.
2016-06-17 21:10:29 -07:00
Dhruve Ashar f1bf0d2f3a [SPARK-15966][DOC] Add closing tag to fix rendering issue for Spark monitoring
## What changes were proposed in this pull request?
Adds the missing closing tag for spark.ui.view.acls.groups

## How was this patch tested?
I built the docs locally and verified the changed in browser.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
**Before:**
![image](https://cloud.githubusercontent.com/assets/7732317/16135005/49fc0724-33e6-11e6-9390-98711593fa5b.png)

**After:**
![image](https://cloud.githubusercontent.com/assets/7732317/16135021/62b5c4a8-33e6-11e6-8118-b22fda5c66eb.png)

Author: Dhruve Ashar <dhruveashar@gmail.com>

Closes #13719 from dhruve/doc/SPARK-15966.
2016-06-16 17:46:19 -07:00
WeichenXu 9040d83bc2 [SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic regression
## What changes were proposed in this pull request?

add ml doc for ml isotonic regression
add scala example for ml isotonic regression
add java example for ml isotonic regression
add python example for ml isotonic regression

modify scala example for mllib isotonic regression
modify java example for mllib isotonic regression
modify python example for mllib isotonic regression

add data/mllib/sample_isotonic_regression_libsvm_data.txt
delete data/mllib/sample_isotonic_regression_data.txt
## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #13381 from WeichenXu123/add_isotonic_regression_doc.
2016-06-16 17:35:40 -07:00
Sean Owen 457126e420 [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config
## What changes were proposed in this pull request?

Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within default JVM old generation size (2/3 heap). See JIRA discussion. This means a full cache doesn't spill into the new gen. CC andrewor14

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #13618 from srowen/SPARK-15796.
2016-06-16 23:04:10 +02:00
Nirman Narang 04d7b3d2b6 [SPARK-7848][STREAMING][UPDATE SPARKSTREAMING DOCS TO INCORPORATE IMPORTANT POINTS.]
Updated the SparkStreaming Doc with some important points.

Author: Nirman Narang <narang@us.ibm.com>

Closes #11114 from nirmannarang/SPARK-7848.
2016-06-15 15:36:31 -07:00
Mortada Mehyar a87a56f5c7 [DOCUMENTATION] fixed typos in python programming guide
## What changes were proposed in this pull request?

minor typo

## How was this patch tested?

minor typo in the doc, should be self explanatory

Author: Mortada Mehyar <mortada.mehyar@gmail.com>

Closes #13639 from mortada/typo.
2016-06-14 09:45:46 +01:00
Sean Owen f51dfe616b [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API
## What changes were proposed in this pull request?

- Deprecate old Java accumulator API; should use Scala now
- Update Java tests and examples
- Don't bother testing old accumulator API in Java 8 (too)
- (fix a misspelling too)

## How was this patch tested?

Jenkins tests

Author: Sean Owen <sowen@cloudera.com>

Closes #13606 from srowen/SPARK-15086.
2016-06-12 11:44:33 -07:00
bomeng 50248dcfff [SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP
## What changes were proposed in this pull request?

SPARK_MASTER_IP is a deprecated environment variable. It is replaced by SPARK_MASTER_HOST according to MasterArguments.scala.

## How was this patch tested?

Manually verified.

Author: bomeng <bmeng@us.ibm.com>

Closes #13543 from bomeng/SPARK-15806.
2016-06-12 14:25:48 +01:00
bomeng 3fd3ee038b [SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc
## What changes were proposed in this pull request?

Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document for `SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are actually used, SparkConf will show a warning message as before.

## How was this patch tested?

Manually tested.

Author: bomeng <bmeng@us.ibm.com>

Closes #13533 from bomeng/SPARK-15781.
2016-06-12 12:58:34 +01:00
Dongjoon Hyun ad102af169 [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents
## What changes were proposed in this pull request?

This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change.

**Fix broken links**
  * mllib-data-types.md
  * mllib-decision-tree.md
  * mllib-ensembles.md
  * mllib-feature-extraction.md
  * mllib-pmml-model-export.md
  * mllib-statistics.md

**Fix malformed section header and scala coding style**
  * mllib-linear-methods.md

**Replace indirect forward links with direct one**
  * ml-classification-regression.md

## How was this patch tested?

Manual tests (with `cd docs; jekyll build`.)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #13608 from dongjoon-hyun/SPARK-15883.
2016-06-11 12:55:38 +01:00
Sean Owen 3761330dd0 [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"
## What changes were proposed in this pull request?

Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old unreferenced logo files.

## How was this patch tested?

Manual check of generated HTML site and Spark UI. I searched for references to the deleted files to make sure they were not used.

Author: Sean Owen <sowen@cloudera.com>

Closes #13609 from srowen/SPARK-15879.
2016-06-11 12:46:07 +01:00
Mortada Mehyar 675a73715d [DOCUMENTATION] fixed groupby aggregation example for pyspark
## What changes were proposed in this pull request?

fixing documentation for the groupby/agg example in python

## How was this patch tested?

the existing example in the documentation dose not contain valid syntax (missing parenthesis) and is not using `Column` in the expression for `agg()`

after the fix here's how I tested it:

```
In [1]: from pyspark.sql import Row

In [2]: import pyspark.sql.functions as func

In [3]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:records = [{'age': 19, 'department': 1, 'expense': 100},
: {'age': 20, 'department': 1, 'expense': 200},
: {'age': 21, 'department': 2, 'expense': 300},
: {'age': 22, 'department': 2, 'expense': 300},
: {'age': 23, 'department': 3, 'expense': 300}]
:--

In [4]: df = sqlContext.createDataFrame([Row(**d) for d in records])

In [5]: df.groupBy("department").agg(df["department"], func.max("age"), func.sum("expense")).show()

+----------+----------+--------+------------+
|department|department|max(age)|sum(expense)|
+----------+----------+--------+------------+
|         1|         1|      20|         300|
|         2|         2|      22|         600|
|         3|         3|      23|         300|
+----------+----------+--------+------------+

Author: Mortada Mehyar <mortada.mehyar@gmail.com>

Closes #13587 from mortada/groupby_agg_doc_fix.
2016-06-10 00:23:34 -07:00