Commit graph

1599 commits

Author SHA1 Message Date
Reynold Xin 9ca0760d67 [SPARK-10063][SQL] Remove DirectParquetOutputCommitter
## What changes were proposed in this pull request?
This patch removes DirectParquetOutputCommitter. This was initially created by Databricks as a faster way to write Parquet data to S3. However, given how the underlying S3 Hadoop implementation works, this committer only works when there are no failures. If there are multiple attempts of the same task (e.g. speculation or task failures or node failures), the output data can be corrupted. I don't think this performance optimization outweighs the correctness issue.

## How was this patch tested?
Removed the related tests also.

Author: Reynold Xin <rxin@databricks.com>

Closes #12229 from rxin/SPARK-10063.
2016-04-07 00:51:45 -07:00
Holden Karau 457e58befe [SPARK-14424][BUILD][DOCS] Update the build docs to switch from assembly to package and add a no…
## What changes were proposed in this pull request?

Change our build docs & shell scripts to that developers are aware of the change from "assembly" to "package"

## How was this patch tested?

Manually ran ./bin/spark-shell after ./build/sbt assembly and verified error message printed, ran new suggested build target and verified ./bin/spark-shell runs after this.

Author: Holden Karau <holden@pigscanfly.ca>
Author: Holden Karau <holden@us.ibm.com>

Closes #12197 from holdenk/SPARK-1424-spark-class-broken-fix-build-docs.
2016-04-06 16:00:29 -07:00
Devaraj K bc36df127d [SPARK-13063][YARN] Make the SPARK YARN STAGING DIR as configurable
## What changes were proposed in this pull request?
Made the SPARK YARN STAGING DIR as configurable with the configuration as 'spark.yarn.staging-dir'.

## How was this patch tested?

I have verified it manually by running applications on yarn, If the 'spark.yarn.staging-dir' is configured then the value used as staging directory otherwise uses the default value i.e. file system’s home directory for the user.

Author: Devaraj K <devaraj@apache.org>

Closes #12082 from devaraj-kavali/SPARK-13063.
2016-04-05 14:12:00 -05:00
Marcelo Vanzin 24d7d2e453 [SPARK-13579][BUILD] Stop building the main Spark assembly.
This change modifies the "assembly/" module to just copy needed
dependencies to its build directory, and modifies the packaging
script to pick those up (and remove duplicate jars packages in the
examples module).

I also made some minor adjustments to dependencies to remove some
test jars from the final packaging, and remove jars that conflict with each
other when packaged separately (e.g. servlet api).

Also note that this change restores guava in applications' classpaths, even
though it's still shaded inside Spark. This is now needed for the Hadoop
libraries that are packaged with Spark, which now are not processed by
the shade plugin.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #11796 from vanzin/SPARK-13579.
2016-04-04 16:52:22 -07:00
Liwei Lin 03d130f973 [SPARK-14342][CORE][DOCS][TESTS] Remove straggler references to Tachyon
## What changes were proposed in this pull request?

Straggler references to Tachyon were removed:
- for docs, `tachyon` has been generalized as `off-heap memory`;
- for Mesos test suits, the key-value `tachyon:true`/`tachyon:false` has been changed to `os:centos`/`os:ubuntu`, since `os` is an example constrain used by the [Mesos official docs](http://mesos.apache.org/documentation/attributes-resources/).

## How was this patch tested?

Existing test suites.

Author: Liwei Lin <lwlin7@gmail.com>

Closes #12129 from lw-lin/tachyon-cleanup.
2016-04-02 17:55:46 -07:00
jerryshao 8ba2b7f28f [SPARK-12343][YARN] Simplify Yarn client and client argument
## What changes were proposed in this pull request?

Currently in Spark on YARN, configurations can be passed through SparkConf, env and command arguments, some parts are duplicated, like client argument and SparkConf. So here propose to simplify the command arguments.

## How was this patch tested?

This patch is tested manually with unit test.

CC vanzin tgravescs , please help to suggest this proposal. The original purpose of this JIRA is to remove `ClientArguments`, through refactoring some arguments like `--class`, `--arg` are not so easy to replace, so here I remove the most part of command line arguments, only keep the minimal set.

Author: jerryshao <sshao@hortonworks.com>

Closes #11603 from jerryshao/SPARK-12343.
2016-04-01 10:52:13 -07:00
Josh Rosen a7af6cd2ea [SPARK-14281][TESTS] Fix java8-tests and simplify their build
This patch fixes a compilation / build break in Spark's `java8-tests` and refactors their POM to simplify the build. See individual commit messages for more details.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #12073 from JoshRosen/fix-java8-tests.
2016-03-31 13:52:59 -07:00
Michael Gummelt 4d93b653f7 [Docs] Update monitoring.md to accurately describe the history server
It looks like the docs were recently updated to reflect the History Server's support for incomplete applications, but they still had wording that suggested only completed applications were viewable.  This fixes that.

My editor also introduced several whitespace removal changes, that I hope are OK, as text files shouldn't have trailing whitespace.  To verify they're purely whitespace changes, add `&w=1` to your browser address.  If this isn't acceptable, let me know and I'll update the PR.

I also didn't think this required a JIRA.  Let me know if I should create one.

Not tested

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #12045 from mgummelt/update-history-docs.
2016-03-31 12:06:21 -07:00
Shixiong Zhu d23ad7c1c9 [SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter
## What changes were proposed in this pull request?

This PR removes all docs about the old streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter projects since I have already copied them to https://github.com/spark-packages

Also remove mqtt_wordcount.py that I forgot to remove previously.

## How was this patch tested?

Jenkins PR Build.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11824 from zsxwing/remove-doc.
2016-03-26 01:47:27 -07:00
Xin Ren d283223a5a [SPARK-13017][DOCS] Replace example code in mllib-feature-extraction.md using include_example
Replace example code in mllib-feature-extraction.md using include_example
https://issues.apache.org/jira/browse/SPARK-13017

The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6.

Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example.
`{% include_example scala/org/apache/spark/examples/mllib/TFIDFExample.scala %}`
Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala` and pick code blocks marked "example" and replace code block in
`{% highlight %}`
 in the markdown.

See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337

Author: Xin Ren <iamshrek@126.com>

Closes #11142 from keypointt/SPARK-13017.
2016-03-24 14:25:10 -07:00
Xin Ren dd9ca7b960 [SPARK-13019][DOCS] fix for scala-2.10 build: Replace example code in mllib-statistics.md using include_example
## What changes were proposed in this pull request?

This PR for ticket SPARK-13019 is based on previous PR(https://github.com/apache/spark/pull/11108).
Since PR(https://github.com/apache/spark/pull/11108) is breaking scala-2.10 build, more work is needed to fix build errors.

What I did new in this PR is adding keyword argument for 'fractions':
`    val approxSample = data.sampleByKey(withReplacement = false, fractions = fractions)`
`    val exactSample = data.sampleByKeyExact(withReplacement = false, fractions = fractions)`

I reopened ticket on JIRA but sorry I don't know how to reopen a GitHub pull request, so I just submitting a new pull request.
## How was this patch tested?

Manual build testing on local machine, build based on scala-2.10.

Author: Xin Ren <iamshrek@126.com>

Closes #11901 from keypointt/SPARK-13019.
2016-03-24 09:34:54 +00:00
Xiangrui Meng 43ef1e52bf Revert "[SPARK-13019][DOCS] Replace example code in mllib-statistics.md using include_example"
This reverts commit 1af8de200c.
2016-03-21 17:42:30 -07:00
Xin Ren 1af8de200c [SPARK-13019][DOCS] Replace example code in mllib-statistics.md using include_example
https://issues.apache.org/jira/browse/SPARK-13019

The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6.

Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example.
`{% include_example scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala %}`
Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala` and pick code blocks marked "example" and replace code block in
`{% highlight %}`
 in the markdown.

See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337

Author: Xin Ren <iamshrek@126.com>

Closes #11108 from keypointt/SPARK-13019.
2016-03-21 16:09:34 -07:00
Dongjoon Hyun c11ea2e413 [MINOR][DOCS] Update build descriptions and commands
## What changes were proposed in this pull request?

This PR updates Scala and Hadoop versions in the build description and commands in `Building Spark` documents.

## How was this patch tested?

N/A

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11838 from dongjoon-hyun/fix_doc_building_spark.
2016-03-18 21:32:48 -07:00
Zheng RuiFeng 204c9dec2c [MINOR][DOC] Add JavaStreamingTestExample
## What changes were proposed in this pull request?

Add the java example of StreamingTest

## How was this patch tested?

manual tests in CLI: bin/run-example mllib.JavaStreamingTestExample dataDir 5 100

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #11776 from zhengruifeng/streaming_je.
2016-03-17 11:09:02 +02:00
Daoyuan Wang d1c193a2f1 [SPARK-12855][MINOR][SQL][DOC][TEST] remove spark.sql.dialect from doc and test
## What changes were proposed in this pull request?

Since developer API of plug-able parser has been removed in #10801 , docs should be updated accordingly.

## How was this patch tested?

This patch will not affect the real code path.

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #11758 from adrian-wang/spark12855.
2016-03-16 22:52:10 -07:00
Dongjoon Hyun 4ce2d24e2a [SPARK-13942][CORE][DOCS] Remove Shark-related docs for 2.x
## What changes were proposed in this pull request?

`Shark` was merged into `Spark SQL` since [July 2014](https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html). The followings seem to be the only legacy. For Spark 2.x, we had better clean up those docs.

**Migration Guide**
```
- ## Migration Guide for Shark Users
- ...
- ### Scheduling
- ...
- ### Reducer number
- ...
- ### Caching
```

## How was this patch tested?

Pass the Jenkins test.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11770 from dongjoon-hyun/SPARK-13942.
2016-03-16 15:50:24 -07:00
Shixiong Zhu 43304b1758 [SPARK-13888][DOC] Remove Akka Receiver doc and refer to the DStream Akka project
## What changes were proposed in this pull request?

I have copied the docs of Streaming Akka to https://github.com/spark-packages/dstream-akka/blob/master/README.md

So we can remove them from Spark now.

## How was this patch tested?

Only document changes.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11711 from zsxwing/remove-akka-doc.
2016-03-14 23:21:30 -07:00
Daniel Santana 9f13f0fc17 [MINOR][DOCS] Added Missing back slashes
## What changes were proposed in this pull request?

When studying spark many users just copy examples on the documentation and paste on their terminals
and because of that the missing backlashes lead them run into some shell errors.

The added backslashes avoid that problem for spark users with that behavior.

## How was this patch tested?

I generated the documentation locally using jekyll and checked the generated pages

Author: Daniel Santana <mestresan@gmail.com>

Closes #11699 from danielsan/master.
2016-03-14 12:26:08 -07:00
Sean Owen 1840852841 [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)
## What changes were proposed in this pull request?

- Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8
- Same for `InputStreamReader` and `OutputStreamWriter` constructors
- Standardizes on UTF-8 everywhere
- Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`)
- (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit 1deecd8d9c )

## How was this patch tested?

Jenkins tests

Author: Sean Owen <sowen@cloudera.com>

Closes #11657 from srowen/SPARK-13823.
2016-03-13 21:03:49 -07:00
Marcelo Vanzin 07f1c54477 [SPARK-13577][YARN] Allow Spark jar to be multiple jars, archive.
In preparation for the demise of assemblies, this change allows the
YARN backend to use multiple jars and globs as the "Spark jar". The
config option has been renamed to "spark.yarn.jars" to reflect that.

A second option "spark.yarn.archive" was also added; if set, this
takes precedence and uploads an archive expected to contain the jar
files with the Spark code and its dependencies.

Existing deployments should keep working, mostly. This change drops
support for the "SPARK_JAR" environment variable, and also does not
fall back to using "jarOfClass" if no configuration is set, falling
back to finding files under SPARK_HOME instead. This should be fine
since "jarOfClass" probably wouldn't work unless you were using
spark-submit anyway.

Tested with the unit tests, and trying the different config options
on a YARN cluster.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #11500 from vanzin/SPARK-13577.
2016-03-11 07:54:57 -06:00
Yuhao Yang 0b713e0455 [SPARK-13512][ML] add example and doc for MaxAbsScaler
## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-13512
Add example and doc for ml.feature.MaxAbsScaler.

## How was this patch tested?
 unit tests

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #11392 from hhbyyh/maxabsdoc.
2016-03-11 09:31:35 +02:00
Zheng RuiFeng d18276cb1d [SPARK-13672][ML] Add python examples of BisectingKMeans in ML and MLLIB
JIRA: https://issues.apache.org/jira/browse/SPARK-13672

## What changes were proposed in this pull request?

add two python examples of BisectingKMeans for ml and mllib

## How was this patch tested?

manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #11515 from zhengruifeng/mllib_bkm_pe.
2016-03-11 09:21:12 +02:00
Dongjoon Hyun 88fa866620 [MINOR][DOC] Fix supported hive version in doc
## What changes were proposed in this pull request?

Today, Spark 1.6.1 and updated docs are release. Unfortunately, there is obsolete hive version information on docs: [Building Spark](http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support). This PR fixes the following two lines.
```
-By default Spark will build with Hive 0.13.1 bindings.
+By default Spark will build with Hive 1.2.1 bindings.
-# Apache Hadoop 2.4.X with Hive 13 support
+# Apache Hadoop 2.4.X with Hive 1.2.1 support
```
`sql/README.md` file also describe

## How was this patch tested?

Manual.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11639 from dongjoon-hyun/fix_doc_hive_version.
2016-03-10 17:07:18 -08:00
JeremyNixon 3e3c3d58d8 [SPARK-13706][ML] Add Python Example for Train Validation Split
## What changes were proposed in this pull request?

This pull request adds a python example for train validation split.

## How was this patch tested?

This was style tested through lint-python, generally tested with ./dev/run-tests, and run in notebook and shell environments. It was viewed in docs locally with jekyll serve.

This contribution is my original work and I license it to Spark under its open source license.

Author: JeremyNixon <jnixon2@gmail.com>

Closes #11547 from JeremyNixon/tvs_example.
2016-03-10 09:18:15 +02:00
Sergiusz Urbaniak a4a0addccf [SPARK-13492][MESOS] Configurable Mesos framework webui URL.
## What changes were proposed in this pull request?

Previously the Mesos framework webui URL was being derived only from the Spark UI address leaving no possibility to configure it. This commit makes it configurable. If unset it falls back to the previous behavior.

Motivation:
This change is necessary in order to be able to install Spark on DCOS and to be able to give it a custom service link. The configured `webui_url` is configured to point to a reverse proxy in the DCOS environment.

## How was this patch tested?

Locally, using unit tests and on DCOS testing and stable revision.

Author: Sergiusz Urbaniak <sur@mesosphere.io>

Closes #11369 from s-urbaniak/sur-webui-url.
2016-03-09 18:10:01 -08:00
Sean Owen 256704c771 [SPARK-13595][BUILD] Move docker, extras modules into external
## What changes were proposed in this pull request?

Move `docker` dirs out of top level into `external/`; move `extras/*` into `external/`

## How was this patch tested?

This is tested with Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #11523 from srowen/SPARK-13595.
2016-03-09 18:27:44 +00:00
Dongjoon Hyun c3689bc24e [SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance creation in Java code.
## What changes were proposed in this pull request?

In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator.

```
-    final ArrayList<Product2<Object, Object>> dataToWrite =
-      new ArrayList<Product2<Object, Object>>();
+    final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>();
```

Java 7 or higher supports **diamond** operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this.

## How was this patch tested?

Manual.
Pass the existing tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11541 from dongjoon-hyun/SPARK-13702.
2016-03-09 10:31:26 +00:00
Sean Owen 54040f8d35 [SPARK-13715][MLLIB] Remove last usages of jblas in tests
## What changes were proposed in this pull request?

Remove last usage of jblas, in tests

## How was this patch tested?

Jenkins tests -- the same ones that are being modified.

Author: Sean Owen <sowen@cloudera.com>

Closes #11560 from srowen/SPARK-13715.
2016-03-08 17:47:55 +00:00
Sean Owen 0eea12a3d9 [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs
## What changes were proposed in this pull request?

Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.

I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.

## How was this patch tested?

`./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.

Author: Sean Owen <sowen@cloudera.com>

Closes #11522 from srowen/SPARK-13596.
2016-03-07 14:48:02 -08:00
CodingCat a3ec50a4bc [MINOR][DOC] improve the doc for "spark.memory.offHeap.size"
The description of "spark.memory.offHeap.size" in the current document does not clearly state that memory is counted with bytes....

This PR contains a small fix for this tiny issue

document fix

Author: CodingCat <zhunansjtu@gmail.com>

Closes #11561 from CodingCat/master.
2016-03-07 12:08:26 -08:00
rmishra 4b13896ebf [SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount
## What changes were proposed in this pull request?
The reference to StatefulNetworkWordCount.scala from updateStatesByKey documentation should be removed, till there is a example for updateStatesByKey.

## How was this patch tested?
Have tested the new documentation with jekyll build.

Author: rmishra <rmishra@pivotal.io>

Closes #11545 from rishitesh/SPARK-13705.
2016-03-07 09:55:49 +00:00
Xin Ren 70f6f9649b [SPARK-13013][DOCS] Replace example code in mllib-clustering.md using include_example
Replace example code in mllib-clustering.md using include_example
https://issues.apache.org/jira/browse/SPARK-13013

The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6.

Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example.
`{% include_example scala/org/apache/spark/examples/mllib/KMeansExample.scala %}`
Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/KMeansExample.scala` and pick code blocks marked "example" and replace code block in
`{% highlight %}`
 in the markdown.

See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337

Author: Xin Ren <iamshrek@126.com>

Closes #11116 from keypointt/SPARK-13013.
2016-03-03 09:32:47 -08:00
Reynold Xin 9e01dcc644 [SPARK-13529][BUILD] Move network/* modules into common/network-*
## What changes were proposed in this pull request?
As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.

## How was this patch tested?
Compilation and existing tests. We should run both SBT and Maven.

Author: Reynold Xin <rxin@databricks.com>

Closes #11409 from rxin/SPARK-13529.
2016-02-28 17:25:07 -08:00
Reynold Xin 59e3e10be2 [SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts
## What changes were proposed in this pull request?
We provide a very limited set of cluster management script in Spark for Tachyon, although Tachyon itself provides a much better version of it. Given now Spark users can simply use Tachyon as a normal file system and does not require extensive configurations, we can remove this management capabilities to simplify Spark bash scripts.

Note that this also reduces coupling between a 3rd party external system and Spark's release scripts, and would eliminate possibility for failures such as Tachyon being renamed or the tar balls being relocated.

## How was this patch tested?
N/A

Author: Reynold Xin <rxin@databricks.com>

Closes #11400 from rxin/release-script.
2016-02-26 22:35:12 -08:00
Dongjoon Hyun 7af0de076f [SPARK-11381][DOCS] Replace example code in mllib-linear-methods.md using include_example
## What changes were proposed in this pull request?

This PR replaces example codes in `mllib-linear-methods.md` using `include_example`
by doing the followings:
  * Extracts the example codes(Scala,Java,Python) as files in `example` module.
  * Merges some dialog-style examples into a single file.
  * Hide redundant codes in HTML for the consistency with other docs.

## How was the this patch tested?

manual test.
This PR can be tested by document generations, `SKIP_API=1 jekyll build`.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11320 from dongjoon-hyun/SPARK-11381.
2016-02-26 08:31:55 -08:00
Bryan Cutler b33261f913 [SPARK-12634][PYSPARK][DOC] PySpark tree parameter desc to consistent format
Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent.  This is for the tree module.

closes #10601

Author: Bryan Cutler <cutlerb@gmail.com>
Author: vijaykiran <mail@vijaykiran.com>

Closes #11353 from BryanCutler/param-desc-consistent-tree-SPARK-12634.
2016-02-26 08:30:32 -08:00
Michael Gummelt c98a93ded3 [SPARK-13439][MESOS] Document that spark.mesos.uris is comma-separated
Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #11311 from mgummelt/document_csv.
2016-02-25 13:32:09 +00:00
JeremyNixon 230bbeaa61 [SPARK-10759][ML] update cross validator with include_example
This pull request uses {%include_example%} to add an example for the python cross validator to ml-guide.

Author: JeremyNixon <jnixon2@gmail.com>

Closes #11240 from JeremyNixon/pipeline_include_example.
2016-02-23 15:57:29 -08:00
Lianhui Wang 9f4263392e [SPARK-7729][UI] Executor which has been killed should also be displayed on Executor Tab
andrewor14 squito Dead Executors should also be displayed on Executor Tab.
as following:
![image](https://cloud.githubusercontent.com/assets/545478/11492707/ae55d7f6-982b-11e5-919a-b62cd84684b2.png)

Author: Lianhui Wang <lianhuiwang09@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Andrew Or <andrew@databricks.com>

Closes #10058 from lianhuiwang/SPARK-7729.
2016-02-23 11:08:39 -08:00
jerryshao e99d017098 [SPARK-13220][CORE] deprecate yarn-client and yarn-cluster mode
Author: jerryshao <sshao@hortonworks.com>

Closes #11229 from jerryshao/SPARK-13220.
2016-02-23 12:30:57 +00:00
Devaraj K 02b1fefffb [SPARK-13012][DOCUMENTATION] Replace example code in ml-guide.md using include_example
Replaced example code in ml-guide.md using include_example

Author: Devaraj K <devaraj@apache.org>

Closes #11053 from devaraj-kavali/SPARK-13012.
2016-02-22 17:21:37 -08:00
Devaraj K 9f410871ca [SPARK-13016][DOCUMENTATION] Replace example code in mllib-dimensionality-reduction.md using include_example
Replaced example example code in mllib-dimensionality-reduction.md using
include_example

Author: Devaraj K <devaraj@apache.org>

Closes #11132 from devaraj-kavali/SPARK-13016.
2016-02-22 17:16:56 -08:00
Bryan Cutler e298ac91e3 [SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format
Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent.  This is for the fpm and recommendation modules.

Closes #10602
Closes #10897

Author: Bryan Cutler <cutlerb@gmail.com>
Author: somideshmukh <somilde@us.ibm.com>

Closes #11186 from BryanCutler/param-desc-consistent-fpmrecc-SPARK-12632.
2016-02-22 12:48:37 +02:00
Dongjoon Hyun 024482bf51 [MINOR][DOCS] Fix all typos in markdown files of doc and similar patterns in other comments
## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.
2016-02-22 09:52:07 +00:00
Dongjoon Hyun 03e62aa3f6 [MINOR][DOCS] Fix typos in configuration.md and hardware-provisioning.md
## What changes were proposed in this pull request?

This PR fixes some typos in the following documentation files.
 * `NOTICE`, `configuration.md`, and `hardware-provisioning.md`.

## How was the this patch tested?

manual tests

Author: Dongjoon Hyun <dongjoonapache.org>

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11289 from dongjoon-hyun/minor_fix_typos_notice_and_confdoc.
2016-02-21 15:27:07 -08:00
Iulian Dragos 6915cc23b3 [MINOR][DOCS][MESOS] Clarify that Mesos version is a lower bound.
## What changes were proposed in this pull request?

Clarify that 0.21 is only a **minimum** requirement.

## How was the this patch tested?

It's a doc change, so no tests.

Author: Iulian Dragos <jaguarul@gmail.com>

Closes #11271 from dragos/patch-1.
2016-02-19 11:47:36 -08:00
Sean Owen fb7e21797e [SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold
Clarify that reduce functions need to be commutative, and fold functions do not

See https://github.com/apache/spark/pull/11091

Author: Sean Owen <sowen@cloudera.com>

Closes #11217 from srowen/SPARK-13339.
2016-02-19 10:26:38 +00:00
Sean Owen b84404865b [SPARK-13324][CORE][BUILD] Update plugin, test, example dependencies for 2.x
Phase 1: update plugin versions, test dependencies, some example and third-party versions

Author: Sean Owen <sowen@cloudera.com>

Closes #11206 from srowen/SPARK-13324.
2016-02-17 19:03:29 -08:00
Christopher C. Aycock a7c74d7563 [SPARK-13350][DOCS] Config doc updated to state that PYSPARK_PYTHON's default is "python2.7"
Author: Christopher C. Aycock <chris@chrisaycock.com>

Closes #11239 from chrisaycock/master.
2016-02-17 11:24:18 -08:00
junhao 7218c0eba9 [SPARK-11627] Add initial input rate limit for spark streaming backpressure mechanism.
https://issues.apache.org/jira/browse/SPARK-11627

Spark Streaming backpressure mechanism has no initial input rate limit, it might cause OOM exception.
In the firest batch task ,receivers receive data at the maximum speed they can reach,it might exhaust executors memory resources. Add a initial input rate limit value can make sure the Streaming job execute  success in the first batch,then the backpressure mechanism can adjust receiving rate adaptively.

Author: junhao <junhao@mogujie.com>

Closes #9593 from junhaoMg/junhao-dev.
2016-02-16 19:43:17 -08:00
BenFradet 00c72d27bf [SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative filtering in general
This documents the implementation of ALS in `spark.ml` with example code in scala, java and python.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10411 from BenFradet/SPARK-12247.
2016-02-16 13:03:28 +00:00
Xin Ren e4675c2402 [SPARK-13018][DOCS] Replace example code in mllib-pmml-model-export.md using include_example
Replace example code in mllib-pmml-model-export.md using include_example
https://issues.apache.org/jira/browse/SPARK-13018

The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6.

Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example.
`{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}`
Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala` and pick code blocks marked "example" and replace code block in
`{% highlight %}`
 in the markdown.

See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337

Author: Xin Ren <iamshrek@126.com>

Closes #11126 from keypointt/SPARK-13018.
2016-02-15 20:17:21 -08:00
JeremyNixon adb5483650 [SPARK-13312][MLLIB] Update java train-validation-split example in ml-guide
Response to JIRA https://issues.apache.org/jira/browse/SPARK-13312.

This contribution is my original work and I license the work to this project.

Author: JeremyNixon <jnixon2@gmail.com>

Closes #11199 from JeremyNixon/update_train_val_split_example.
2016-02-15 09:25:13 +00:00
Amit Dev 331293c302 [SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy
Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps.

Author: Amit Dev <amitdev@gmail.com>

Closes #11180 from amitdev/master.
2016-02-14 11:41:27 +00:00
Sanket 894921d813 [SPARK-6166] Limit number of in flight outbound requests
This JIRA is related to
https://github.com/apache/spark/pull/5852
Had to do some minor rework and test to make sure it
works with current version of spark.

Author: Sanket <schintap@untilservice-lm>

Closes #10838 from redsanket/limit-outbound-connections.
2016-02-11 22:40:00 -08:00
Steve Loughran a2c7dcf61f [SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps
When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available.  It does this by checking if a version of the app has been loaded with a larger *filesize*.  If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI.

https://issues.apache.org/jira/browse/SPARK-7889

Author: Steve Loughran <stevel@hortonworks.com>
Author: Imran Rashid <irashid@cloudera.com>

Closes #11118 from squito/SPARK-7889-alternate.
2016-02-11 21:37:53 -06:00
Sasaki Toru c2f21d8898 [SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template
In spark-env.sh.template, there are multi-byte characters, this PR will remove it.

Author: Sasaki Toru <sasakitoa@nttdata.co.jp>

Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.
2016-02-11 09:30:36 +00:00
Sean Owen 29c547303f [SPARK-12414][CORE] Remove closure serializer
Remove spark.closure.serializer option and use JavaSerializer always

CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be.

Author: Sean Owen <sowen@cloudera.com>

Closes #11150 from srowen/SPARK-12414.
2016-02-10 13:34:53 -08:00
Michael Gummelt 80cb963ad9 [SPARK-5095][MESOS] Support launching multiple mesos executors in coarse grained mesos mode.
This is the next iteration of tnachen's previous PR: https://github.com/apache/spark/pull/4027

In that PR, we resolved with andrewor14 and pwendell to implement the Mesos scheduler's support of `spark.executor.cores` to be consistent with YARN and Standalone.  This PR implements that resolution.

This PR implements two high-level features.  These two features are co-dependent, so they're implemented both here:
- Mesos support for spark.executor.cores
- Multiple executors per slave

We at Mesosphere have been working with Typesafe on a Spark/Mesos integration test suite: https://github.com/typesafehub/mesos-spark-integration-tests, which passes for this PR.

The contribution is my original work and I license the work to the project under the project's open source license.

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #10993 from mgummelt/executor_sizing.
2016-02-10 10:53:33 -08:00
Luciano Resende 2dbb916440 [SPARK-13189] Cleanup build references to Scala 2.10
Author: Luciano Resende <lresende@apache.org>

Closes #11092 from lresende/SPARK-13189.
2016-02-09 11:56:25 -08:00
Sebastián Ramírez c882ec57de [SPARK-13040][DOCS] Update JDBC deprecated SPARK_CLASSPATH documentation
Update JDBC documentation based on http://stackoverflow.com/a/30947090/219530 as SPARK_CLASSPATH is deprecated.

Also, that's how it worked, it didn't work with the SPARK_CLASSPATH or the --jars alone.

This would solve issue: https://issues.apache.org/jira/browse/SPARK-13040

Author: Sebastián Ramírez <tiangolo@gmail.com>

Closes #10948 from tiangolo/patch-docs-jdbc.
2016-02-09 08:49:34 +00:00
Luc Bourlier 0bb5b73387 [SPARK-13002][MESOS] Send initial request of executors for dyn allocation
Fix for [SPARK-13002](https://issues.apache.org/jira/browse/SPARK-13002) about the initial number of executors when running with dynamic allocation on Mesos.
Instead of fixing it just for the Mesos case, made the change in `ExecutorAllocationManager`. It is already driving the number of executors running on Mesos, only no the initial value.

The `None` and `Some(0)` are internal details on the computation of resources to reserved, in the Mesos backend scheduler. `executorLimitOption` has to be initialized correctly, otherwise the Mesos backend scheduler will, either, create to many executors at launch, or not create any executors and not be able to recover from this state.

Removed the 'special case' description in the doc. It was not totally accurate, and is not needed anymore.

This doesn't fix the same problem visible with Spark standalone. There is no straightforward way to send the initial value in standalone mode.

Somebody knowing this part of the yarn support should review this change.

Author: Luc Bourlier <luc.bourlier@typesafe.com>

Closes #11047 from skyluc/issue/initial-dyn-alloc-2.
2016-02-05 14:37:42 -08:00
Bill Chambers 66e1383de2 [SPARK-13214][DOCS] update dynamicAllocation documentation
Author: Bill Chambers <bill@databricks.com>

Closes #11094 from anabranch/dynamic-docs.
2016-02-05 14:35:39 -08:00
Yuhao Yang c2c956bcd1 [ML][DOC] fix wrong api link in ml onevsrest
minor fix for api link in ml onevsrest

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #11068 from hhbyyh/onevsrestDoc.
2016-02-03 21:19:44 -08:00
Timothy Chen 51b03b71ff [SPARK-12463][SPARK-12464][SPARK-12465][SPARK-10647][MESOS] Fix zookeeper dir with mesos conf and add docs.
Fix zookeeper dir configuration used in cluster mode, and also add documentation around these settings.

Author: Timothy Chen <tnachen@gmail.com>

Closes #10057 from tnachen/fix_mesos_dir.
2016-02-01 12:45:02 -08:00
Lewuathe 711ce048a2 [ML][MINOR] Invalid MulticlassClassification reference in ml-guide
In [ml-guide](https://spark.apache.org/docs/latest/ml-guide.html#example-model-selection-via-cross-validation), there is invalid reference to `MulticlassClassificationEvaluator` apidoc.

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation.MultiClassClassificationEvaluator

Author: Lewuathe <lewuathe@me.com>

Closes #10996 from Lewuathe/fix-typo-in-ml-guide.
2016-02-01 12:21:21 -08:00
Takeshi YAMAMURO da9146c91a [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md
ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds.

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #10901 from maropu/DocFix.
2016-02-01 12:02:06 -08:00
Josh Rosen 289373b28c [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version
This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).

The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).

After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10608 from JoshRosen/SPARK-6363.
2016-01-30 00:20:28 -08:00
James Lohse c2204436a1 Provide same info as in spark-submit --help
this is stated for --packages and --repositories. Without stating it for --jars, people expect a standard java classpath to work, with expansion and using a different delimiter than a comma. Currently this is only state in the --help for spark-submit "Comma-separated list of local jars to include on the driver and executor classpaths."

Author: James Lohse <jimlohse@users.noreply.github.com>

Closes #10890 from jimlohse/patch-1.
2016-01-28 10:50:50 +00:00
Andrew 093291cf9b [SPARK-1680][DOCS] Explain environment variables for running on YARN in cluster mode
JIRA 1680 added a property called spark.yarn.appMasterEnv.  This PR draws users' attention to this special case by adding an explanation in configuration.html#environment-variables

Author: Andrew <weiner.andrew.j@gmail.com>

Closes #10869 from weineran/branch-yarn-docs.
2016-01-27 09:31:44 +00:00
Shixiong Zhu cbd507d69c [SPARK-7799][STREAMING][DOCUMENT] Add the linking and deploying instructions for streaming-akka project
Since `actorStream` is an external project, we should add the linking and deploying instructions for it.

A follow up PR of #10744

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10856 from zsxwing/akka-link-instruction.
2016-01-26 11:31:54 -08:00
Sean Owen 649e9d0f5b [SPARK-3369][CORE][STREAMING] Java mapPartitions Iterator->Iterable is inconsistent with Scala's Iterator->Iterator
Fix Java function API methods for flatMap and mapPartitions to require producing only an Iterator, not Iterable. Also fix DStream.flatMap to require a function producing TraversableOnce only, not Traversable.

CC rxin pwendell for API change; tdas since it also touches streaming.

Author: Sean Owen <sowen@cloudera.com>

Closes #10413 from srowen/SPARK-3369.
2016-01-26 11:55:28 +00:00
Yanbo Liang dd2325d9a7 [SPARK-11965][ML][DOC] Update user guide for RFormula feature interactions
Update user guide for RFormula feature interactions. Meanwhile we also update other new features such as supporting string label in Spark 1.6.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10222 from yanboliang/spark-11965.
2016-01-25 11:52:26 -08:00
Sean Owen aca2a01654 [SPARK-12760][DOCS] inaccurate description for difference between local vs cluster mode in closure handling
Clarify that modifying a driver local variable won't have the desired effect in cluster modes, and may or may not work as intended in local mode

Author: Sean Owen <sowen@cloudera.com>

Closes #10866 from srowen/SPARK-12760.
2016-01-23 11:45:12 +00:00
Mortada Mehyar 56f57f894e [SPARK-12760][DOCS] invalid lambda expression in python example for …
…local vs cluster

srowen thanks for the PR at https://github.com/apache/spark/pull/10866! sorry it took me a while.

This is related to https://github.com/apache/spark/pull/10866, basically the assignment in the lambda expression in the python example is actually invalid

```
In [1]: data = [1, 2, 3, 4, 5]
In [2]: counter = 0
In [3]: rdd = sc.parallelize(data)
In [4]: rdd.foreach(lambda x: counter += x)
  File "<ipython-input-4-fcb86c182bad>", line 1
    rdd.foreach(lambda x: counter += x)
                                   ^
SyntaxError: invalid syntax
```

Author: Mortada Mehyar <mortada.mehyar@gmail.com>

Closes #10867 from mortada/doc_python_fix.
2016-01-23 11:36:33 +00:00
Shixiong Zhu bc1babd63d [SPARK-7997][CORE] Remove Akka from Spark Core and Streaming
- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
- Remove HttpFileServer
- Remove Akka configs from SparkConf and SSLOptions
- Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult`  depends on it.
- Update comments and docs

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10854 from zsxwing/remove-akka.
2016-01-22 21:20:04 -08:00
felixcheung 85200c09ad [SPARK-12534][DOC] update documentation to list command line equivalent to properties
Several Spark properties equivalent to Spark submit command line options are missing.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10491 from felixcheung/sparksubmitdoc.
2016-01-21 16:30:20 +01:00
Sun Rui 1b2a918e59 [SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR.
Author: Sun Rui <rui.sun@intel.com>

Closes #10201 from sun-rui/SPARK-12204.
2016-01-20 21:08:15 -08:00
Shixiong Zhu b7d74a602f [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project
Include the following changes:

1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
3. Update the ActorWordCount example and add the JavaActorWordCount example
4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10744 from zsxwing/streaming-akka-2.
2016-01-20 13:55:41 -08:00
felixcheung 488bbb216c [SPARK-12232][SPARKR] New R API for read.table to avoid name conflict
shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table`

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10406 from felixcheung/readtable.
2016-01-19 18:31:03 -08:00
scwf 43f1d59e17 [SPARK-2750][WEB UI] Add https support to the Web UI
Author: scwf <wangfei1@huawei.com>
Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: WangTaoTheTonic <wangtao111@huawei.com>
Author: w00228970 <wangfei1@huawei.com>

Closes #10238 from vanzin/SPARK-2750.
2016-01-19 14:49:55 -08:00
Shixiong Zhu 721845c1b6 [SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc
This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10822 from zsxwing/kinesis-doc.
2016-01-18 16:50:05 -08:00
Shixiong Zhu a973f483f6 [SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc
This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10746 from zsxwing/flume-doc.
2016-01-18 15:38:03 -08:00
Jeff Lam 86972fa521 [SPARK-12722][DOCS] Fixed typo in Pipeline example
http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline
```
val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model")
```
should be
```
val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model")
```
cc: jkbradley

Author: Jeff Lam <sha0lin@alumni.carnegiemellon.edu>

Closes #10769 from Agent007/SPARK-12722.
2016-01-16 10:41:40 +00:00
Josh Rosen 8dbbf3e75e [SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile
This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version.

/cc rxin srowen

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10775 from JoshRosen/add-hadoop-2.7-profile.
2016-01-15 17:07:24 -08:00
Tom Graves 96fb894d4b [SPARK-2930] clarify docs on using webhdfs with spark.yarn.access.nam…
…enodes

Author: Tom Graves <tgraves@yahoo-inc.com>

Closes #10699 from tgravescs/SPARK-2930.
2016-01-15 13:11:27 +00:00
Joseph K. Bradley 20d8ef858a [SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user guide example
Fixed WSSSE computeCost in Python mllib KMeans user guide example by using new computeCost method API in Python.

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #10707 from jkbradley/kmeans-doc-fix.
2016-01-13 18:01:29 -08:00
Luc Bourlier cc91e21879 [SPARK-12805][MESOS] Fixes documentation on Mesos run modes
The default run has changed, but the documentation didn't fully reflect the change.

Author: Luc Bourlier <luc.bourlier@typesafe.com>

Closes #10740 from skyluc/issue/mesos-modes-doc.
2016-01-13 11:45:13 -08:00
Sean Owen 9c7f34af37 [SPARK-5273][MLLIB][DOCS] Improve documentation examples for LinearRegression
Use a much smaller step size in LinearRegressionWithSGD MLlib examples to achieve a reasonable RMSE.

Our training folks hit this exact same issue when concocting an example and had the same solution.

Author: Sean Owen <sowen@cloudera.com>

Closes #10675 from srowen/SPARK-5273.
2016-01-12 12:13:32 +00:00
Brandon Bradley a767ee8a05 [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting
Warning users about casting changes.

Author: Brandon Bradley <bradleytastic@gmail.com>

Closes #10708 from blbradley/spark-12758.
2016-01-11 14:21:50 -08:00
Reynold Xin 5b0d544339 [SPARK-12735] Consolidate & move spark-ec2 to AMPLab managed repository.
Author: Reynold Xin <rxin@databricks.com>

Closes #10673 from rxin/SPARK-12735.
2016-01-09 20:28:20 -08:00
Sean Owen 659fd9d04b [SPARK-4819] Remove Guava's "Optional" from public API
Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`)

See also https://github.com/apache/spark/pull/10512

Author: Sean Owen <sowen@cloudera.com>

Closes #10513 from srowen/SPARK-4819.
2016-01-08 13:02:30 -08:00
Jeff Zhang 00d9261724 [DOCUMENTATION] doc fix of job scheduling
spark.shuffle.service.enabled is spark application related configuration, it is not necessary to set it in yarn-site.xml

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10657 from zjffdu/doc-fix.
2016-01-08 11:38:46 -08:00
Shixiong Zhu c94199e977 [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming
/cc tdas brkyvz

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10453 from zsxwing/streaming-conf.
2016-01-07 17:37:46 -08:00
Jacek Laskowski 8113dbda0b [STREAMING][DOCS][EXAMPLES] Minor fixes
Author: Jacek Laskowski <jacek@japila.pl>

Closes #10603 from jaceklaskowski/streaming-actor-custom-receiver.
2016-01-07 00:27:13 -08:00
zzcclp 84e77a15df [DOC] fix 'spark.memory.offHeap.enabled' default value to false
modify 'spark.memory.offHeap.enabled' default value to false

Author: zzcclp <xm_zzc@sina.com>

Closes #10633 from zzcclp/fix_spark.memory.offHeap.enabled_default_value.
2016-01-06 23:06:21 -08:00
Josh Rosen 8e19c7663a [SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0
This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata cleaning code.

Now that we have the `ContextCleaner` and a timer to trigger periodic GCs, I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs.

For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10534 from JoshRosen/remove-ttl-based-cleaning.
2016-01-06 20:50:31 -08:00
BenFradet f82ebb1522 [SPARK-12368][ML][DOC] Better doc for the binary classification evaluator' metricName
For the BinaryClassificationEvaluator, the scaladoc doesn't mention that "areaUnderPR" is supported, only that the default is "areadUnderROC".
Also, in the documentation, it is said that:
"The default metric used to choose the best ParamMap can be overriden by the setMetric method in each of these evaluators."
However, the method is called setMetricName.

This PR aims to fix both issues.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10328 from BenFradet/SPARK-12368.
2016-01-06 12:01:05 -08:00
Yanbo Liang 1c6cf1a563 [SPARK-12570][ML][DOC] DecisionTreeRegressor: provide variance of prediction: user guide update
Update user guide doc for ```DecisionTreeRegressor``` providing variance of prediction.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10594 from yanboliang/spark-12570.
2016-01-05 14:24:32 -08:00
felixcheung 8896ec9f02 [SPARKR][DOC] minor doc update for version in migration guide
checked that the change is in Spark 1.6.0.
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10574 from felixcheung/rwritemodedoc.
2016-01-05 08:39:58 +05:30
Josh Rosen 6c83d938cc [SPARK-12579][SQL] Force user-specified JDBC driver to take precedence
Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection.

In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection.

This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly).

If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different).

This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons).

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10519 from JoshRosen/jdbc-driver-precedence.
2016-01-04 10:39:42 -08:00
Reynold Xin ee8f8d3184 [SPARK-12588] Remove HttpBroadcast in Spark 2.0.
We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0.

Author: Reynold Xin <rxin@databricks.com>

Closes #10531 from rxin/SPARK-12588.
2015-12-30 18:07:07 -08:00
Shixiong Zhu 20591afd79 [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming
This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10385 from zsxwing/accumulator-broadcast-example.
2015-12-22 16:39:10 -08:00
Shixiong Zhu 93db50d1c2 [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler
Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10439 from zsxwing/kafka-message-handler-doc.
2015-12-22 15:33:30 -08:00
Reynold Xin 0a38637d05 [SPARK-11807] Remove support for Hadoop < 2.2
i.e. Hadoop 1 and Hadoop 2.0

Author: Reynold Xin <rxin@databricks.com>

Closes #10404 from rxin/SPARK-11807.
2015-12-21 22:15:52 -08:00
Davies Liu 29cecd4a42 [SPARK-12388] change default compression to lz4
According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy.

After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4).

[1] https://github.com/ning/jvm-compressor-benchmark/wiki

cc rxin

Author: Davies Liu <davies@databricks.com>

Closes #10342 from davies/lz4.
2015-12-21 14:21:43 -08:00
Reynold Xin 284e29a870 [SPARK-11808] Remove Bagel.
Author: Reynold Xin <rxin@databricks.com>

Closes #10395 from rxin/SPARK-11808.
2015-12-19 22:40:35 -08:00
Reynold Xin f496031bd2 Bump master version to 2.0.0-SNAPSHOT.
Author: Reynold Xin <rxin@databricks.com>

Closes #10387 from rxin/version-bump.
2015-12-19 15:13:05 -08:00
gatorsmile 499ac3e69a [SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levels
The current default storage level of Python persist API is MEMORY_ONLY_SER. This is different from the default level MEMORY_ONLY in the official document and RDD APIs.

davies Is this inconsistency intentional? Thanks!

Updates: Since the data is always serialized on the Python side, the storage levels of JAVA-specific deserialization are not removed, such as MEMORY_ONLY.

Updates: Based on the reviewers' feedback. In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library, so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`, `MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2` and `OFF_HEAP`.

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10092 from gatorsmile/persistStorageLevel.
2015-12-18 20:06:05 -08:00
Burak Yavuz 2377b707f2 [SPARK-11985][STREAMING][KINESIS][DOCS] Update Kinesis docs
- Provide example on `message handler`
 - Provide bit on KPL record de-aggregation
 - Fix typos

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #9970 from brkyvz/kinesis-docs.
2015-12-18 15:24:41 -08:00
Joseph K. Bradley 8148cc7a5c [SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6
No known breaking changes, but some deprecations and changes of behavior.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #10235 from jkbradley/mllib-guide-update-1.6.
2015-12-16 11:53:04 -08:00
Yu ISHIKAWA 7b6dc29d0e [SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for bisecting k-means
This PR includes only an example code in order to finish it quickly.
I'll send another PR for the docs soon.

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #9952 from yu-iskw/SPARK-6518.
2015-12-16 10:55:42 -08:00
Yu ISHIKAWA 26d70bd2b4 [SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml
cc jkbradley

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #10244 from yu-iskw/SPARK-12215.
2015-12-16 10:43:45 -08:00
Jeff Zhang 2eb5af5f0d [SPARK-12318][SPARKR] Save mode in SparkR should be error by default
shivaram  Please help review.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10290 from zjffdu/SPARK-12318.
2015-12-16 10:32:32 -08:00
Timothy Hunter a6325fc401 [SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation
This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow.
Credit goes to the original author Titan-C (mentioned in the NOTICE).

Note that I am not a CSS expert, so I can only address comments up to some extent.

Default view:
<img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png">

When collapsed manually by the user:
<img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png">

Disappears when column is too narrow:
<img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png">

Can still be opened by the user if necessary:
<img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png">

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10297 from thunterdb/12324.
2015-12-16 10:12:33 -08:00
jerryshao 63ccdef813 [SPARK-10123][DEPLOY] Support specifying deploy mode from configuration
Please help to review, thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes #10195 from jerryshao/SPARK-10123.
2015-12-15 18:24:23 -08:00
Timothy Chen c2de99a7c3 [SPARK-12351][MESOS] Add documentation about submitting Spark with mesos cluster mode.
Adding more documentation about submitting jobs with mesos cluster mode.

Author: Timothy Chen <tnachen@gmail.com>

Closes #10086 from tnachen/mesos_supervise_docs.
2015-12-15 18:20:00 -08:00
BenFradet e25f1fe427 [MINOR][DOC] Fix broken word2vec link
Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10282 from BenFradet/SPARK-12199.
2015-12-14 13:50:30 +00:00
Xusen Yin 98b212d36b [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md
https://issues.apache.org/jira/browse/SPARK-12199

Follow-up PR of SPARK-11551. Fix some errors in ml-features.md

mengxr

Author: Xusen Yin <yinxusen@gmail.com>

Closes #10193 from yinxusen/SPARK-12199.
2015-12-12 17:47:01 -08:00
BenFradet aea676ca2d [SPARK-12217][ML] Document invalid handling for StringIndexer
Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation.

I wonder if I should also add a snippet to the code example, input welcome.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10257 from BenFradet/SPARK-12217.
2015-12-11 15:43:00 -08:00
anabranch aa305dcaf5 [SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation
Adding in Pipeline Import and Export Documentation.

Author: anabranch <wac.chambers@gmail.com>
Author: Bill Chambers <wchambers@ischool.berkeley.edu>

Closes #10179 from anabranch/master.
2015-12-11 12:55:56 -08:00
jerryshao 24d3357d66 [STREAMING][DOC][MINOR] Update the description of direct Kafka stream doc
With the merge of [SPARK-8337](https://issues.apache.org/jira/browse/SPARK-8337), now the Python API has the same functionalities compared to Scala/Java, so here changing the description to make it more precise.

zsxwing tdas , please review, thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes #10246 from jerryshao/direct-kafka-doc-update.
2015-12-10 15:31:46 -08:00
Josh Rosen 23a9e62bad [SPARK-12251] Document and improve off-heap memory configurations
This patch adds documentation for Spark configurations that affect off-heap memory and makes some naming and validation improvements for those configs.

- Change `spark.memory.offHeapSize` to `spark.memory.offHeap.size`. This is fine because this configuration has not shipped in any Spark release yet (it's new in Spark 1.6).
- Deprecated `spark.unsafe.offHeap` in favor of a new `spark.memory.offHeap.enabled` configuration. The motivation behind this change is to gather all memory-related configurations under the same prefix.
- Add a check which prevents users from setting `spark.memory.offHeap.enabled=true` when `spark.memory.offHeap.size == 0`. After SPARK-11389 (#9344), which was committed in Spark 1.6, Spark enforces a hard limit on the amount of off-heap memory that it will allocate to tasks. As a result, enabling off-heap execution memory without setting `spark.memory.offHeap.size` will lead to immediate OOMs. The new configuration validation makes this scenario easier to diagnose, helping to avoid user confusion.
- Document these configurations on the configuration page.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10237 from JoshRosen/SPARK-12251.
2015-12-10 15:29:04 -08:00
Marcelo Vanzin 4a46b8859d [SPARK-11563][CORE][REPL] Use RpcEnv to transfer REPL-generated classes.
This avoids bringing up yet another HTTP server on the driver, and
instead reuses the file server already managed by the driver's
RpcEnv. As a bonus, the repl now inherits the security features of
the network library.

There's also a small change to create the directory for storing classes
under the root temp dir for the application (instead of directly
under java.io.tmpdir).

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #9923 from vanzin/SPARK-11563.
2015-12-10 13:26:30 -08:00
Timothy Hunter 2ecbe02d5b [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation.
Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).

It also removes some files that I forgot to delete with #10207

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10234 from thunterdb/12212.
2015-12-10 12:50:46 -08:00
Yin Huai ac8cdf1cdc [SPARK-11678][SQL][DOCS] Document basePath in the programming guide.
This PR adds document for `basePath`, which is a new parameter used by `HadoopFsRelation`.

The compiled doc is shown below.
![image](https://cloud.githubusercontent.com/assets/2072857/11673132/1ba01192-9dcb-11e5-98d9-ac0b4e92e98c.png)

JIRA: https://issues.apache.org/jira/browse/SPARK-11678

Author: Yin Huai <yhuai@databricks.com>

Closes #10211 from yhuai/basePathDoc.
2015-12-09 18:09:36 -08:00
Andrew Ray 7a8e587dc0 [SPARK-12211][DOC][GRAPHX] Fix version number in graphx doc for migration from 1.1
Migration from 1.1 section added to the GraphX doc in 1.2.0 (see https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#migrating-from-spark-11) uses \{{site.SPARK_VERSION}} as the version where changes were introduced, it should be just 1.2.

Author: Andrew Ray <ray.andrew@gmail.com>

Closes #10206 from aray/graphx-doc-1.1-migration.
2015-12-09 17:16:01 -08:00
Xusen Yin 051c6a066f [SPARK-11551][DOC] Replace example code in ml-features.md using include_example
PR on behalf of somideshmukh, thanks!

Author: Xusen Yin <yinxusen@gmail.com>
Author: somideshmukh <somilde@us.ibm.com>

Closes #10219 from yinxusen/SPARK-11551.
2015-12-09 12:00:48 -08:00
Timothy Hunter 765c67f5f2 [SPARK-8517][ML][DOC] Reorganizes the spark.ml user guide
This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested.

<img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png">

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10207 from thunterdb/spark-8517.
2015-12-08 18:40:21 -08:00
Michael Armbrust 3959489423 [SPARK-12069][SQL] Update documentation with Datasets
Author: Michael Armbrust <michael@databricks.com>

Closes #10060 from marmbrus/docs.
2015-12-08 15:58:35 -08:00
BenFradet 06746b3005 [SPARK-12159][ML] Add user guide section for IndexToString transformer
Documentation regarding the `IndexToString` label transformer with code snippets in Scala/Java/Python.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10166 from BenFradet/SPARK-12159.
2015-12-08 12:45:34 -08:00
Cheng Lian da2012a0e1 [SPARK-11551][DOC][EXAMPLE] Revert PR #10002
This reverts PR #10002, commit 78209b0cca.

The original PR wasn't tested on Jenkins before being merged.

Author: Cheng Lian <lian@databricks.com>

Closes #10200 from liancheng/revert-pr-10002.
2015-12-08 19:18:59 +08:00
Yanbo Liang 4a39b5a1be [SPARK-11958][SPARK-11957][ML][DOC] SQLTransformer user guide and example code
Add ```SQLTransformer``` user guide, example code and make Scala API doc more clear.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10006 from yanboliang/spark-11958.
2015-12-07 23:50:57 -08:00
somideshmukh 78209b0cca [SPARK-11551][DOC][EXAMPLE] Replace example code in ml-features.md using include_example
Made new patch contaning only markdown examples moved to exmaple/folder.
Ony three  java code were not shfted since they were contaning compliation error ,these classes are
1)StandardScale 2)NormalizerExample 3)VectorIndexer

Author: Xusen Yin <yinxusen@gmail.com>
Author: somideshmukh <somilde@us.ibm.com>

Closes #10002 from somideshmukh/SomilBranch1.33.
2015-12-07 23:26:34 -08:00
Xusen Yin 871e85d9c1 [SPARK-11963][DOC] Add docs for QuantileDiscretizer
https://issues.apache.org/jira/browse/SPARK-11963

Author: Xusen Yin <yinxusen@gmail.com>

Closes #9962 from yinxusen/SPARK-11963.
2015-12-07 13:16:47 -08:00
rotems f30373f5ee [SPARK-12080][CORE] Kryo - Support multiple user registrators
Author: rotems <roter>

Closes #10078 from Botnaim/KryoMultipleCustomRegistrators.
2015-12-04 16:58:34 -08:00
felixcheung 43c575cb17 [SPARK-12116][SPARKR][DOCS] document how to workaround function name conflicts with dplyr
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10119 from felixcheung/rdocdplyrmasked.
2015-12-03 09:22:21 -08:00
Jeff Zhang 7470d9edbb [DOCUMENTATION][MLLIB] typo in mllib doc
\cc mengxr

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10093 from zjffdu/mllib_typo.
2015-12-03 15:36:28 +00:00
Andrew Or d96f8c997b [SPARK-12081] Make unified memory manager work with small heaps
The existing `spark.memory.fraction` (default 0.75) gives the system 25% of the space to work with. For small heaps, this is not enough: e.g. default 1GB leaves only 250MB system memory. This is especially a problem in local mode, where the driver and executor are crammed in the same JVM. Members of the community have reported driver OOM's in such cases.

**New proposal.** We now reserve 300MB before taking the 75%. For 1GB JVMs, this leaves `(1024 - 300) * 0.75 = 543MB` for execution and storage. This is proposal (1) listed in the [JIRA](https://issues.apache.org/jira/browse/SPARK-12081).

Author: Andrew Or <andrew@databricks.com>

Closes #10081 from andrewor14/unified-memory-small-heaps.
2015-12-01 19:51:12 -08:00
Xusen Yin e76431f886 [SPARK-11961][DOC] Add docs of ChiSqSelector
https://issues.apache.org/jira/browse/SPARK-11961

Author: Xusen Yin <yinxusen@gmail.com>

Closes #9965 from yinxusen/SPARK-11961.
2015-12-01 15:21:53 -08:00
woj-i 6a8cf80cc8 [SPARK-11821] Propagate Kerberos keytab for all environments
andrewor14 the same PR as in branch 1.5
harishreedharan

Author: woj-i <wojciechindyk@gmail.com>

Closes #9859 from woj-i/master.
2015-12-01 11:05:45 -08:00
Josh Rosen f73379be2b [HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin.
I accidentally omitted these as part of #10049.
2015-11-30 18:25:59 -08:00
Xusen Yin e6dc89a339 [SPARK-12035] Add more debug information in include_example tag of Jekyll
https://issues.apache.org/jira/browse/SPARK-12035

When we debuging lots of example code files, like in https://github.com/apache/spark/pull/10002, it's hard to know which file causes errors due to limited information in `include_example.rb`. With their filenames, we can locate bugs easily.

Author: Xusen Yin <yinxusen@gmail.com>

Closes #10026 from yinxusen/SPARK-12035.
2015-11-30 17:18:44 -08:00
Josh Rosen d3ca8cfac2 [SPARK-12000] Fix API doc generation issues
This pull request fixes multiple issues with API doc generation.

- Modify the Jekyll plugin so that the entire doc build fails if API docs cannot be generated. This will make it easy to detect when the doc build breaks, since this will now trigger Jenkins failures.
- Change how we handle the `-target` compiler option flag in order to fix `javadoc` generation.
- Incorporate doc changes from thunterdb (in #10048).

Closes #10048.

Author: Josh Rosen <joshrosen@databricks.com>
Author: Timothy Hunter <timhunter@databricks.com>

Closes #10049 from JoshRosen/fix-doc-build.
2015-11-30 16:37:27 -08:00
Feynman Liang 5535888930 [SPARK-11960][MLLIB][DOC] User guide for streaming tests
CC jkbradley mengxr josepablocam

Author: Feynman Liang <feynman.liang@gmail.com>

Closes #10005 from feynmanliang/streaming-test-user-guide.
2015-11-30 15:38:44 -08:00
Yuhao Yang e232720a65 [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml
jira: https://issues.apache.org/jira/browse/SPARK-11689

Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions.

Original PR is reverted due to document build error. https://github.com/apache/spark/pull/9722

mengxr feynmanliang yinxusen  Sorry for the troubling.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #9974 from hhbyyh/ldaMLExample.
2015-11-30 14:56:51 -08:00
BenFradet f2fbfa444f [MINOR][DOCS] fixed list display in ml-ensembles
The list in ml-ensembles.md wasn't properly formatted and, as a result, was looking like this:
![old](http://i.imgur.com/2ZhELLR.png)

This PR aims to make it look like this:
![new](http://i.imgur.com/0Xriwd2.png)

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10025 from BenFradet/ml-ensembles-doc.
2015-11-30 13:02:08 -08:00
muxator 4376b5bea8 doc typo: "classificaion" -> "classification"
Author: muxator <muxator@users.noreply.github.com>

Closes #10008 from muxator/patch-1.
2015-11-26 18:52:20 -08:00
Jeff Zhang 67b6732088 [DOCUMENTATION] Fix minor doc error
Author: Jeff Zhang <zjffdu@apache.org>

Closes #9956 from zjffdu/dev_typo.
2015-11-25 11:37:42 -08:00