Commit graph

1341 commits

Author SHA1 Message Date
vundela 2f6dd634c1 [SPARK-11105] [YARN] Distribute log4j.properties to executors
Currently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings.

If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties

Author: vundela <vsr@cloudera.com>
Author: Srinivasa Reddy Vundela <vsr@cloudera.com>

Closes #9118 from vundela/master.
2015-10-20 11:12:28 -07:00
Lukasz Piepiora a112d69fdc [SPARK-11174] [DOCS] Fix typo in the GraphX programming guide
This patch fixes a small typo in the GraphX programming guide

Author: Lukasz Piepiora <lpiepiora@gmail.com>

Closes #9160 from lpiepiora/11174-fix-typo-in-graphx-programming-guide.
2015-10-18 14:25:57 +01:00
Britta Weber 723aa75a9d fix typo bellow -> below
Author: Britta Weber <britta.weber@elasticsearch.com>

Closes #9136 from brwe/typo-bellow.
2015-10-15 14:47:11 -07:00
Nick Pritchard b591de7c07 [SPARK-11039][Documentation][Web UI] Document additional ui configurations
Add documentation for configuration:
- spark.sql.ui.retainedExecutions
- spark.streaming.ui.retainedBatches

Author: Nick Pritchard <nicholas.pritchard@falkonry.com>

Closes #9052 from pnpritchard/SPARK-11039.
2015-10-15 12:45:37 -07:00
Andrew Or b3ffac5178 [SPARK-10983] Unified memory manager
This patch unifies the memory management of the storage and execution regions such that either side can borrow memory from each other. When memory pressure arises, storage will be evicted in favor of execution. To avoid regressions in cases where storage is crucial, we dynamically allocate a fraction of space for storage that execution cannot evict. Several configurations are introduced:

- **spark.memory.fraction (default 0.75)**: ​fraction of the heap space used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records.

- **spark.memory.storageFraction (default 0.5)**: size of the storage region within the space set aside by `s​park.memory.fraction`. ​Cached data may only be evicted if total storage exceeds this region.

- **spark.memory.useLegacyMode (default false)**: whether to use the memory management that existed in Spark 1.5 and before. This is mainly for backward compatibility.

For a detailed description of the design, see [SPARK-10000](https://issues.apache.org/jira/browse/SPARK-10000). This patch builds on top of the `MemoryManager` interface introduced in #9000.

Author: Andrew Or <andrew@databricks.com>

Closes #9084 from andrewor14/unified-memory-manager.
2015-10-13 13:49:59 -07:00
jerryshao f97e9323b5 [SPARK-10739] [YARN] Add application attempt window for Spark on Yarn
Add application attempt window for Spark on Yarn to ignore old out of window failures, this is useful for long running applications to recover from failures.

Author: jerryshao <sshao@hortonworks.com>

Closes #8857 from jerryshao/SPARK-10739 and squashes the following commits:

36eabdc [jerryshao] change the doc
7f9b77d [jerryshao] Style change
1c9afd0 [jerryshao] Address the comments
caca695 [jerryshao] Add application attempt window for Spark on Yarn
2015-10-12 18:18:19 -07:00
Kay Ousterhout 091c2c3ecd [SPARK-11056] Improve documentation of SBT build.
This commit improves the documentation around building Spark to
(1) recommend using SBT interactive mode to avoid the overhead of
launching SBT and (2) refer to the wiki page that documents using
SPARK_PREPEND_CLASSES to avoid creating the assembly jar for each
compile.

cc srowen

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #9068 from kayousterhout/SPARK-11056.
2015-10-12 14:23:29 -07:00
Jean-Baptiste Onofré 60150cf00a [SPARK-10883] Add a note about how to build Spark sub-modules (reactor)
Author: Jean-Baptiste Onofré <jbonofre@apache.org>

Closes #8993 from jbonofre/SPARK-10883-2.
2015-10-08 11:38:39 +01:00
admackin cd28139c9b Akka framesize units should be specified
1.4 docs noted that the units were MB - i have assumed this is still the case

Author: admackin <admackin@users.noreply.github.com>

Closes #9025 from admackin/master.
2015-10-08 00:01:23 -07:00
Xin Ren 27cdde2ff8 [SPARK-10669] [DOCS] Link to each language's API in codetabs in ML docs: spark.mllib
In the Markdown docs for the spark.mllib Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "ChiSqSelector" section in 64743870f2/docs/mllib-feature-extraction.md
This JIRA is just for spark.mllib, not spark.ml.

Please let me know if more work is needed, thanks a lot.

Author: Xin Ren <iamshrek@126.com>

Closes #8977 from keypointt/SPARK-10669.
2015-10-07 15:00:19 +01:00
Sean Owen 82bbc2a5f2 [SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'.
Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs.
Follow-on to https://github.com/apache/spark/pull/8385
CC nssalian

Author: Sean Owen <sowen@cloudera.com>

Closes #8968 from srowen/SPARK-9570.
2015-10-04 09:31:52 +01:00
Yuhao Yang 9b9fe5f7bf [SPARK-10670] [ML] [Doc] add api reference for ml doc
jira: https://issues.apache.org/jira/browse/SPARK-10670
In the Markdown docs for the spark.ml Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "Word2Vec" section in 64743870f2/docs/ml-features.md
This JIRA is just for spark.ml, not spark.mllib

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #8901 from hhbyyh/docAPI.
2015-09-28 22:40:02 -07:00
David Martin b58249930d Fix two mistakes in programming-guide page
seperate -> separate
sees -> see

Author: David Martin <dmartinpro@users.noreply.github.com>

Closes #8928 from dmartinpro/patch-1.
2015-09-28 10:41:39 +01:00
Bin Wang fb4c7be747 add doc for spark.streaming.stopGracefullyOnShutdown
Author: Bin Wang <wbin00@gmail.com>

Closes #8898 from wb14123/doc.
2015-09-27 21:26:54 +01:00
Matt Hagen 558e9c7e60 [SPARK-10663] Removed unnecessary invocation of DataFrame.toDF method.
The Scala example under the "Example: Pipeline" heading in this
document initializes the "test" variable to a DataFrame. Because test
is already a DF, there is not need to call test.toDF as the example
does in a subsequent line: model.transform(test.toDF). So, I removed
the extraneous toDF invocation.

Author: Matt Hagen <anonz3000@gmail.com>

Closes #8875 from hagenhaus/SPARK-10663.
2015-09-22 21:14:25 -07:00
Akash Mishra 0bd0e5bed2 [SPARK-10695] [DOCUMENTATION] [MESOS] Fixing incorrect value informati…
…on for spark.mesos.constraints parameter.

Author: Akash Mishra <akash.mishra20@gmail.com>

Closes #8816 from SleepyThread/constraint-fix.
2015-09-22 00:14:27 -07:00
Marcelo Vanzin 97a99dde6e [SPARK-10676] [DOCS] Add documentation for SASL encryption options.
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8803 from vanzin/SPARK-10676.
2015-09-21 13:15:44 -07:00
Jacek Laskowski ca9fe540fe [SPARK-10662] [DOCS] Code snippets are not properly formatted in tables
* Backticks are processed properly in Spark Properties table
* Removed unnecessary spaces
* See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html

Author: Jacek Laskowski <jacek.laskowski@deepsense.io>

Closes #8795 from jaceklaskowski/docs-yarn-formatting.
2015-09-21 19:46:39 +01:00
Josh Rosen 2117eea71e [SPARK-10710] Remove ability to disable spilling in core and SQL
It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`.

This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.
2015-09-19 21:40:21 -07:00
Alexis Seigneurin d83b6aae8b Fixed links to the API
Submitting this change on the master branch as requested in https://github.com/apache/spark/pull/8819#issuecomment-141505941

Author: Alexis Seigneurin <alexis.seigneurin@gmail.com>

Closes #8838 from aseigneurin/patch-2.
2015-09-19 12:01:22 +01:00
Kousuke Saruta d507f9c0b7 [SPARK-10584] [SQL] [DOC] Documentation about the compatible Hive version is wrong.
In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong.

/CC yhuai

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #8776 from sarutak/SPARK-10584-2.
2015-09-19 01:59:36 -07:00
Reynold Xin 348d7c9a93 [SPARK-9808] Remove hash shuffle file consolidation.
Author: Reynold Xin <rxin@databricks.com>

Closes #8812 from rxin/SPARK-9808-1.
2015-09-18 13:48:41 -07:00
Reynold Xin 74d8f7dda8 Added <code> tag to documentation. 2015-09-17 22:46:13 -07:00
Felix Bechstein 9a56dcdf7f docs/running-on-mesos.md: state default values in default column
This PR simply uses the default value column for defaults.

Author: Felix Bechstein <felix.bechstein@otto.de>

Closes #8810 from felixb/fix_mesos_doc.
2015-09-17 22:42:46 -07:00
Michael Armbrust e0dc2bc232 [SPARK-10650] Clean before building docs
The [published docs for 1.5.0](http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/) have a bunch of test classes in them.  The only way I can reproduce this is to `test:compile` before running `unidoc`.  To prevent this from happening again, I've added a clean before doc generation.

Author: Michael Armbrust <michael@databricks.com>

Closes #8787 from marmbrus/testsInDocs.
2015-09-17 11:05:30 -07:00
yangping.wu c88bb5df94 [SPARK-10660] Doc describe error in the "Running Spark on YARN" page
In the Configuration section, the **spark.yarn.driver.memoryOverhead** and **spark.yarn.am.memoryOverhead**‘s default value should be "driverMemory * 0.10, with minimum of 384" and "AM memory * 0.10, with minimum of 384" respectively. Because from Spark 1.4.0, the **MEMORY_OVERHEAD_FACTOR** is set to 0.1.0, not 0.07.

Author: yangping.wu <wyphao.2007@163.com>

Closes #8797 from 397090770/SparkOnYarnDocError.
2015-09-17 09:52:40 -07:00
Joseph K. Bradley b921fe4dc0 [SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups
Various ML guide cleanups.

* ml-guide.md: Make it easier to access the algorithm-specific guides.
* LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically.  E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics.
* mllib-feature-extraction.html#elementwiseproduct: “w” parameter should be “scalingVec”
* Clean up Binarizer user guide a little.
* Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place.
* spark.ml Word2Vec user guide: clean up grammar/writing
* Chi Sq Feature Selector docs: Improve text in doc.

CC: mengxr feynmanliang

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8752 from jkbradley/mlguide-fixes-1.5.
2015-09-15 19:43:26 -07:00
Jacek Laskowski 416003b264 [DOCS] Small fixes to Spark on Yarn doc
* a follow-up to 16b6d18613 as `--num-executors` flag is not suppported.
* links + formatting

Author: Jacek Laskowski <jacek.laskowski@deepsense.io>

Closes #8762 from jaceklaskowski/docs-spark-on-yarn.
2015-09-15 20:42:33 +01:00
Reynold Xin 09b7e7c198 Update version to 1.6.0-SNAPSHOT.
Author: Reynold Xin <rxin@databricks.com>

Closes #8350 from rxin/1.6.
2015-09-15 00:54:20 -07:00
Jacek Laskowski 833be73314 Small fixes to docs
Links work now properly + consistent use of *Spark standalone cluster* (Spark uppercase + lowercase the rest -- seems agreed in the other places in the docs).

Author: Jacek Laskowski <jacek.laskowski@deepsense.io>

Closes #8759 from jaceklaskowski/docs-submitting-apps.
2015-09-14 23:40:29 -07:00
Kousuke Saruta cf2821ef5f [SPARK-10584] [DOC] [SQL] Documentation about spark.sql.hive.metastore.version is wrong.
The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1.
Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #8739 from sarutak/SPARK-10584.
2015-09-14 12:06:23 -07:00
Sean Owen 1dc614b874 [SPARK-10222] [GRAPHX] [DOCS] More thoroughly deprecate Bagel in favor of GraphX
Finish deprecating Bagel; remove reference to nonexistent example

Author: Sean Owen <sowen@cloudera.com>

Closes #8731 from srowen/SPARK-10222.
2015-09-13 08:36:46 +01:00
y-shimizu c268ca4ddd [SPARK-10518] [DOCS] Update code examples in spark.ml user guide to use LIBSVM data source instead of MLUtils
I fixed to use LIBSVM data source in the example code in spark.ml instead of MLUtils

Author: y-shimizu <y.shimizu0429@gmail.com>

Closes #8697 from y-shimizu/SPARK-10518.
2015-09-11 08:27:30 -07:00
Akash Mishra a5ef2d0600 [SPARK-10514] [MESOS] waiting for min no of total cores acquired by Spark by implementing the sufficientResourcesRegistered method
spark.scheduler.minRegisteredResourcesRatio configuration parameter works for YARN mode but not for Mesos Coarse grained mode.

If the parameter specified default value of 0 will be set for spark.scheduler.minRegisteredResourcesRatio in base class and this method will always return true.

There are no existing test for YARN mode too. Hence not added test for the same.

Author: Akash Mishra <akash.mishra20@gmail.com>

Closes #8672 from SleepyThread/master.
2015-09-10 12:04:02 -07:00
Holden Karau a76bde9dae [SPARK-10469] [DOC] Try and document the three options
From JIRA:
Add documentation for tungsten-sort.
From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in
https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its
corresponding description in
http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty
there are only 'sort' and 'hash' two options)."

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort.
2015-09-10 11:49:53 -07:00
Sean Paradiso 1dc7548c59 [MINOR] [MLLIB] [ML] [DOC] fixed typo: label for negative result should be 0.0 (original: 1.0)
Small typo in the example for `LabelledPoint` in the MLLib docs.

Author: Sean Paradiso <seanparadiso@gmail.com>

Closes #8680 from sparadiso/docs_mllib_smalltypo.
2015-09-09 22:09:33 -07:00
Yuhao Yang 91a577d277 [SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover User Guide
jira: https://issues.apache.org/jira/browse/SPARK-10249

update user guide since python support added.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #8620 from hhbyyh/swPyDocExample.
2015-09-08 22:33:23 -07:00
Tathagata Das 52b24a602a [SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about rate limiting and backpressure
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8656 from tdas/SPARK-10492 and squashes the following commits:

986cdd6 [Tathagata Das] Added information on backpressure
2015-09-08 14:54:43 -07:00
Jacek Laskowski 6ceed852ab Docs small fixes
Author: Jacek Laskowski <jacek@japila.pl>

Closes #8629 from jaceklaskowski/docs-fixes.
2015-09-08 14:38:10 +01:00
Stephen Hopper 9d8e838d88 [DOC] Added R to the list of languages with "high-level API" support in the…
… main README.

Author: Stephen Hopper <shopper@shopper-osx.local>

Closes #8646 from enragedginger/master.
2015-09-08 14:36:34 +01:00
Reynold Xin 5ffe752b59 [SPARK-9767] Remove ConnectionManager.
We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it.

Author: Reynold Xin <rxin@databricks.com>

Closes #8161 from rxin/SPARK-9767.
2015-09-07 10:42:30 -10:00
Tathagata Das 7a4f326c00 [SPARK-10440] [STREAMING] [DOCS] Update python API stuff in the programming guides and python docs
- Fixed information around Python API tags in streaming programming guides
- Added missing stuff in python docs

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8595 from tdas/SPARK-10440.
2015-09-04 23:16:39 -10:00
Timothy Chen b087d23e28 [SPARK-9669] [MESOS] Support PySpark on Mesos cluster mode.
Support running pyspark with cluster mode on Mesos!
This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI.

Author: Timothy Chen <tnachen@gmail.com>

Closes #8349 from tnachen/mesos_python.
2015-09-04 15:21:31 -07:00
Tom Graves 49aff7b9ad [SPARK-10432] spark.port.maxRetries documentation is unclear
Author: Tom Graves <tgraves@yahoo-inc.com>

Closes #8585 from tgravescs/SPARK-10432.
2015-09-03 13:46:16 -07:00
zhuol ec01280533 [SPARK-4223] [CORE] Support * in acls.
SPARK-4223.

Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access.

Manual tests to verify that: "*" works for any user in:
a. Spark ui: view and kill stage.     Done.
b. Spark history server.                  Done.
c. Yarn application killing.  Done.

Author: zhuol <zhuol@yahoo-inc.com>

Closes #8398 from zhuoliu/4223.
2015-09-01 11:14:59 -10:00
Sean Owen 3f63bd6023 [SPARK-10398] [DOCS] Migrate Spark download page to use new lua mirroring scripts
Migrate Apache download closer.cgi refs to new closer.lua

This is the bit of the change that affects the project docs; I'm implementing the changes to the Apache site separately.

Author: Sean Owen <sowen@cloudera.com>

Closes #8557 from srowen/SPARK-10398.
2015-09-01 20:06:01 +01:00
Xiangrui Meng ca69fc8efd [SPARK-10331] [MLLIB] Update example code in ml-guide
* The example code was added in 1.2, before `createDataFrame`. This PR switches to `createDataFrame`. Java code still uses JavaBean.
* assume `sqlContext` is available
* fix some minor issues from previous code review

jkbradley srowen feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8518 from mengxr/SPARK-10331.
2015-08-29 23:57:09 -07:00
Xiangrui Meng 905fbe498b [SPARK-10348] [MLLIB] updates ml-guide
* replace `ML Dataset` by `DataFrame` to unify the abstraction
* ML algorithms -> pipeline components to describe the main concept
* remove Scala API doc links from the main guide
* `Section Title` -> `Section tile` to be consistent with other section titles in MLlib guide
* modified lines break at 100 chars or periods

jkbradley feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8517 from mengxr/SPARK-10348.
2015-08-29 23:26:23 -07:00
GuoQiang Li 5369be8068 [SPARK-10350] [DOC] [SQL] Removed duplicated option description from SQL guide
Author: GuoQiang Li <witgo@qq.com>

Closes #8520 from witgo/SPARK-10350.
2015-08-29 13:20:27 -07:00
martinzapletal e8ea5bafee [SPARK-9910] [ML] User guide for train validation split
Author: martinzapletal <zapletal-martin@email.cz>

Closes #8377 from zapletal-martin/SPARK-9910.
2015-08-28 21:03:48 -07:00
Xiangrui Meng 88032ecaf0 [SPARK-9671] [MLLIB] re-org user guide and add migration guide
This PR updates the MLlib user guide and adds migration guide for 1.4->1.5.

* merge migration guide for `spark.mllib` and `spark.ml` packages
* remove dependency section from `spark.ml` guide
* move the paragraph about `spark.mllib` and `spark.ml` to the top and recommend `spark.ml`
* move Sam's talk to footnote to make the section focus on dependencies

Minor changes to code examples and other wording will be in a separate PR.

jkbradley srowen feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8498 from mengxr/SPARK-9671.
2015-08-28 13:53:31 -07:00
Yuhao Yang e2a843090c [SPARK-9890] [DOC] [ML] User guide for CountVectorizer
jira: https://issues.apache.org/jira/browse/SPARK-9890

document with Scala and java examples

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #8487 from hhbyyh/cvDoc.
2015-08-28 08:00:44 -07:00
Keiji Yoshida 18294cd871 Fix DynamodDB/DynamoDB typo in Kinesis Integration doc
Fix DynamodDB/DynamoDB typo in Kinesis Integration doc

Author: Keiji Yoshida <yoshida.keiji.84@gmail.com>

Closes #8501 from yosssi/patch-1.
2015-08-28 09:36:50 +01:00
Feynman Liang af0e1249b1 [SPARK-9905] [ML] [DOC] Adds LinearRegressionSummary user guide
* Adds user guide for `LinearRegressionSummary`
* Fixes unresolved issues in  #8197

CC jkbradley mengxr

Author: Feynman Liang <fliang@databricks.com>

Closes #8491 from feynmanliang/SPARK-9905.
2015-08-27 21:55:20 -07:00
MechCoder 30734d45fb [SPARK-9911] [DOC] [ML] Update Userguide for Evaluator
I added a small note about the different types of evaluator and the metrics used.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #8304 from MechCoder/multiclass_evaluator.
2015-08-27 21:44:06 -07:00
Yin Huai b3dd569ad4 [SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path
https://issues.apache.org/jira/browse/SPARK-10287

After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet).

Author: Yin Huai <yhuai@databricks.com>

Closes #8469 from yhuai/jsonRefresh.
2015-08-27 16:11:25 -07:00
Feynman Liang 5bfe9e1111 [SPARK-9680] [MLLIB] [DOC] StopWordsRemovers user guide and Java compatibility test
* Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
* Cleans up scaladocs for public methods
* Adds test for Java compatibility
* Follow up Python user guide code example is tracked by SPARK-10249

Author: Feynman Liang <fliang@databricks.com>

Closes #8436 from feynmanliang/SPARK-10230.
2015-08-27 16:10:37 -07:00
MechCoder c94ecdfc5b [SPARK-9906] [ML] User guide for LogisticRegressionSummary
User guide for LogisticRegression summaries

Author: MechCoder <manojkumarsivaraj334@gmail.com>
Author: Manoj Kumar <mks542@nyu.edu>
Author: Feynman Liang <fliang@databricks.com>

Closes #8197 from MechCoder/log_summary_user_guide.
2015-08-27 15:33:43 -07:00
Yuhao Yang 6185cdd2af [SPARK-9901] User guide for RowMatrix Tall-and-skinny QR
jira: https://issues.apache.org/jira/browse/SPARK-9901

The jira covers only the document update. I can further provide example code for QR (like the ones for SVD and PCA) in a separate PR.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #8462 from hhbyyh/qrDoc.
2015-08-27 13:57:20 -07:00
CodingCat 84baa5e9b5 [SPARK-10315] remove document on spark.akka.failure-detector.threshold
https://issues.apache.org/jira/browse/SPARK-10315

this parameter is not used any longer and there is some mistake in the current document , should be 'akka.remote.watch-failure-detector.threshold'

Author: CodingCat <zhunansjtu@gmail.com>

Closes #8483 from CodingCat/SPARK_10315.
2015-08-27 20:19:09 +01:00
Michael Armbrust dc86a227e4 [SPARK-9148] [SPARK-10252] [SQL] Update SQL Programming Guide
Author: Michael Armbrust <michael@databricks.com>

Closes #8441 from marmbrus/documentation.
2015-08-27 11:45:15 -07:00
Moussa Taifi 9625d13d57 [DOCS] [STREAMING] [KAFKA] Fix typo in exactly once semantics
Fix Typo in exactly once semantics
[Semantics of output operations] link

Author: Moussa Taifi <moutai10@gmail.com>

Closes #8468 from moutai/patch-3.
2015-08-27 10:34:47 +01:00
Cheng Lian 0fac144f6b [SPARK-9424] [SQL] Parquet programming guide updates for 1.5
Author: Cheng Lian <lian@databricks.com>

Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.
2015-08-26 18:14:54 -07:00
Feynman Liang 125205cdb3 [SPARK-9888] [MLLIB] User guide for new LDA features
* Adds two new sections to LDA's user guide; one for each optimizer/model
 * Documents new features added to LDA (e.g. topXXXperXXX, asymmetric priors, hyperpam optimization)
 * Cleans up a TODO and sets a default parameter in LDA code

jkbradley hhbyyh

Author: Feynman Liang <fliang@databricks.com>

Closes #8254 from feynmanliang/SPARK-9888.
2015-08-25 17:39:20 -07:00
Yuhao Yang b37f0cc1b4 [SPARK-8531] [ML] Update ML user guide for MinMaxScaler
jira: https://issues.apache.org/jira/browse/SPARK-8531

Update ML user guide for MinMaxScaler

Author: Yuhao Yang <hhbyyh@gmail.com>
Author: unknown <yuhaoyan@yuhaoyan-MOBL1.ccr.corp.intel.com>

Closes #7211 from hhbyyh/minmaxdoc.
2015-08-25 10:54:03 -07:00
Joseph K. Bradley 13db11cb08 [SPARK-10061] [DOC] ML ensemble docs
User guide for spark.ml GBTs and Random Forests.
The examples are copied from the decision tree guide and modified to run.

I caught some issues I had somehow missed in the tree guide as well.

I have run all examples, including Java ones.  (Of course, I thought I had previously as well...)

CC: mengxr manishamde yanboliang

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8369 from jkbradley/ml-ensemble-docs.
2015-08-24 15:38:54 -07:00
Keiji Yoshida 623c675fde Update streaming-programming-guide.md
Update `See the Scala example` to `See the Java example`.

Author: Keiji Yoshida <yoshida.keiji.84@gmail.com>

Closes #8376 from yosssi/patch-1.
2015-08-23 11:04:29 +01:00
Keiji Yoshida 46fcb9e0db Update programming-guide.md
Update `lineLengths.persist();` to `lineLengths.persist(StorageLevel.MEMORY_ONLY());` because `JavaRDD#persist` needs a parameter of `StorageLevel`.

Author: Keiji Yoshida <yoshida.keiji.84@gmail.com>

Closes #8372 from yosssi/patch-1.
2015-08-22 02:38:10 -07:00
Xusen Yin 630a994e6a [SPARK-9893] User guide with Java test suite for VectorSlicer
Add user guide for `VectorSlicer`, with Java test suite and Python version VectorSlicer.

Note that Python version does not support selecting by names now.

Author: Xusen Yin <yinxusen@gmail.com>

Closes #8267 from yinxusen/SPARK-9893.
2015-08-21 16:30:12 -07:00
Alexander Ulanov dcfe0c5cde [SPARK-9846] [DOCS] User guide for Multilayer Perceptron Classifier
Added user guide for multilayer perceptron classifier:
  - Simplified description of the multilayer perceptron classifier
  - Example code for Scala and Java

Author: Alexander Ulanov <nashb@yandex.ru>

Closes #8262 from avulanov/SPARK-9846-mlpc-docs.
2015-08-20 20:02:27 -07:00
Eric Liang 8e0a072f78 [SPARK-9895] User Guide for RFormula Feature Transformer
mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #8293 from ericl/docs-2.
2015-08-19 15:43:08 -07:00
Marcelo Vanzin 5fd53c64bb [SPARK-9833] [YARN] Add options to disable delegation token retrieval.
This allows skipping the code that tries to talk to Hive and HBase to
fetch delegation tokens, in case that somehow conflicts with the application
being run.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8134 from vanzin/SPARK-9833.
2015-08-19 10:51:59 -07:00
Yanbo Liang 802b5b8791 [SPARK-10084] [MLLIB] [DOC] Add Python example for mllib FP-growth user guide
1, Add Python example for mllib FP-growth user guide.
2, Correct mistakes of Scala and Java examples.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8279 from yanboliang/spark-10084.
2015-08-19 08:53:34 -07:00
Joseph K. Bradley 39e4ebd521 [SPARK-10060] [ML] [DOC] spark.ml DecisionTree user guide
New user guide section ml-decision-tree.md, including code examples.

I have run all examples, including the Java ones.

CC: manishamde yanboliang mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8244 from jkbradley/ml-dt-docs.
2015-08-19 07:38:27 -07:00
lewuathe ba2a07e2b6 [SPARK-9977] [DOCS] Update documentation for StringIndexer
By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label.
I think it is better to make it explicit on documentation.

Author: lewuathe <lewuathe@me.com>

Closes #8205 from Lewuathe/SPARK-9977.
2015-08-19 09:54:03 +01:00
Sean Owen f141efeafb [SPARK-10070] [DOCS] Remove Guava dependencies in user guides
`Lists.newArrayList` -> `Arrays.asList`

CC jkbradley feynmanliang

Anybody into replacing usages of `Lists.newArrayList` in the examples / source code too? this method isn't useful in Java 7 and beyond.

Author: Sean Owen <sowen@cloudera.com>

Closes #8272 from srowen/SPARK-10070.
2015-08-19 09:41:09 +01:00
Bill Chambers b23c4d3ffc Fix Broken Link
Link was broken because it included tick marks.

Author: Bill Chambers <wchambers@ischool.berkeley.edu>

Closes #8302 from anabranch/patch-1.
2015-08-19 00:05:01 -07:00
Alexander Ulanov 1c843e2848 [SPARK-9508] GraphX Pregel docs update with new Pregel code
SPARK-9436 simplifies the Pregel code. graphx-programming-guide needs to be modified accordingly since it lists the old Pregel code

Author: Alexander Ulanov <nashb@yandex.ru>

Closes #7831 from avulanov/SPARK-9508-pregel-doc2.
2015-08-18 22:13:52 -07:00
Davies Liu de3223872a [SPARK-9705] [DOC] fix docs about Python version
cc JoshRosen

Author: Davies Liu <davies@databricks.com>

Closes #8245 from davies/python_doc.
2015-08-18 22:11:27 -07:00
Feynman Liang badf7fa650 [SPARK-8473] [SPARK-9889] [ML] User guide and example code for DCT
mengxr jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8184 from feynmanliang/SPARK-9889-DCT-docs.
2015-08-18 17:54:49 -07:00
Dennis Huo 9b731fad2b [SPARK-9782] [YARN] Support YARN application tags via SparkConf
Add a new test case in yarn/ClientSuite which checks how the various SparkConf
and ClientArguments propagate into the ApplicationSubmissionContext.

Author: Dennis Huo <dhuo@google.com>

Closes #8072 from dennishuo/dhuo-yarn-application-tags.
2015-08-18 14:34:20 -07:00
Piotr Migdal 8bae9015b7 [SPARK-10085] [MLLIB] [DOCS] removed unnecessary numpy array import
See https://issues.apache.org/jira/browse/SPARK-10085

Author: Piotr Migdal <pmigdal@gmail.com>

Closes #8284 from stared/spark-10085.
2015-08-18 12:59:28 -07:00
Yanbo Liang 747c2ba800 [SPARK-10032] [PYSPARK] [DOC] Add Python example for mllib LDAModel user guide
Add Python example for mllib LDAModel user guide

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8227 from yanboliang/spark-10032.
2015-08-18 12:56:36 -07:00
Yanbo Liang f4fa61effe [SPARK-10029] [MLLIB] [DOC] Add Python examples for mllib IsotonicRegression user guide
Add Python examples for mllib IsotonicRegression user guide

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8225 from yanboliang/spark-10029.
2015-08-18 12:55:36 -07:00
Feynman Liang f5ea391290 [SPARK-9900] [MLLIB] User guide for Association Rules
Updates FPM user guide to include Association Rules.

Author: Feynman Liang <fliang@databricks.com>

Closes #8207 from feynmanliang/SPARK-9900-arules.
2015-08-18 12:53:57 -07:00
jose.cambronero c90c605dc6 [SPARK-9902] [MLLIB] Add Java and Python examples to user guide for 1-sample KS test
added doc examples for python.

Author: jose.cambronero <jose.cambronero@cloudera.com>

Closes #8154 from josepablocam/spark_9902.
2015-08-17 19:09:45 -07:00
Sandy Ryza f9d1a92aa1 [SPARK-7707] User guide and example code for KernelDensity
Author: Sandy Ryza <sandy@cloudera.com>

Closes #8230 from sryza/sandy-spark-7707.
2015-08-17 17:57:51 -07:00
Feynman Liang 0b6b017613 [SPARK-9898] [MLLIB] Prefix Span user guide
Adds user guide for `PrefixSpan`, including Scala and Java example code.

mengxr zhangjiajin

Author: Feynman Liang <fliang@databricks.com>

Closes #8253 from feynmanliang/SPARK-9898.
2015-08-17 17:53:24 -07:00
Yanbo Liang 0076e82123 [SPARK-9768] [PYSPARK] [ML] Add Python API and user guide for ml.feature.ElementwiseProduct
Add Python API, user guide and example for ml.feature.ElementwiseProduct.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8061 from yanboliang/SPARK-9768.
2015-08-17 17:25:41 -07:00
Feynman Liang fdaf17f63f [SPARK-10068] [MLLIB] Adds links to MLlib types, algos, utilities listing
mengxr jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8255 from feynmanliang/SPARK-10068.
2015-08-17 15:42:14 -07:00
Reynold Xin e5fd60415f [SPARK-9934] Deprecate NIO ConnectionManager.
Deprecate NIO ConnectionManager in Spark 1.5.0, before removing it in Spark 1.6.0.

Author: Reynold Xin <rxin@databricks.com>

Closes #8162 from rxin/SPARK-9934.
2015-08-14 20:55:32 -07:00
Rosstin 7a539ef3b1 [SPARK-8965] [DOCS] Add ml-guide Python Example: Estimator, Transformer, and Param
Added ml-guide Python Example: Estimator, Transformer, and Param
/docs/_site/ml-guide.html

Author: Rosstin <asterazul@gmail.com>

Closes #8081 from Rosstin/SPARK-8965.
2015-08-13 09:18:39 -07:00
Niranjan Padmanabhan 738f353988 [SPARK-9092] Fixed incompatibility when both num-executors and dynamic...
… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.

Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>

Closes #7657 from neurons/SPARK-9092.
2015-08-12 16:10:21 -07:00
Yuhao Yang 66d87c1d76 [SPARK-7583] [MLLIB] User guide update for RegexTokenizer
jira: https://issues.apache.org/jira/browse/SPARK-7583

User guide update for RegexTokenizer

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #7828 from hhbyyh/regexTokenizerDoc.
2015-08-12 09:35:32 -07:00
Timothy Chen 741a29f989 [SPARK-9575] [MESOS] Add docuemntation around Mesos shuffle service.
andrewor14

Author: Timothy Chen <tnachen@gmail.com>

Closes #7907 from tnachen/mesos_shuffle.
2015-08-11 23:33:22 -07:00
Timothy Chen 5c99d8bf98 [SPARK-8798] [MESOS] Allow additional uris to be fetched with mesos
Some users like to download additional files in their sandbox that they can refer to from their spark program, or even later mount these files to another directory.

Author: Timothy Chen <tnachen@gmail.com>

Closes #7195 from tnachen/mesos_files.
2015-08-11 23:26:33 -07:00
Eric Liang 74a293f453 [SPARK-9713] [ML] Document SparkR MLlib glm() integration in Spark 1.5
This documents the use of R model formulae in the SparkR guide. Also fixes some bugs in the R api doc.

mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #8085 from ericl/docs.
2015-08-11 21:26:03 -07:00
Prabeesh K 853809e948 [SPARK-5155] [PYSPARK] [STREAMING] Mqtt streaming support in Python
This PR is based on #4229, thanks prabeesh.

Closes #4229

Author: Prabeesh K <prabsmails@gmail.com>
Author: zsxwing <zsxwing@gmail.com>
Author: prabs <prabsmails@gmail.com>
Author: Prabeesh K <prabeesh.k@namshi.com>

Closes #7833 from zsxwing/pr4229 and squashes the following commits:

9570bec [zsxwing] Fix the variable name and check null in finally
4a9c79e [zsxwing] Fix pom.xml indentation
abf5f18 [zsxwing] Merge branch 'master' into pr4229
935615c [zsxwing] Fix the flaky MQTT tests
47278c5 [zsxwing] Include the project class files
478f844 [zsxwing] Add unpack
5f8a1d4 [zsxwing] Make the maven build generate the test jar for Python MQTT tests
734db99 [zsxwing] Merge branch 'master' into pr4229
126608a [Prabeesh K] address the comments
b90b709 [Prabeesh K] Merge pull request #1 from zsxwing/pr4229
d07f454 [zsxwing] Register StreamingListerner before starting StreamingContext; Revert unncessary changes; fix the python unit test
a6747cb [Prabeesh K] wait for starting the receiver before publishing data
87fc677 [Prabeesh K] address the comments:
97244ec [zsxwing] Make sbt build the assembly test jar for streaming mqtt
80474d1 [Prabeesh K] fix
1f0cfe9 [Prabeesh K] python style fix
e1ee016 [Prabeesh K] scala style fix
a5a8f9f [Prabeesh K] added Python test
9767d82 [Prabeesh K] implemented Python-friendly class
a11968b [Prabeesh K] fixed python style
795ec27 [Prabeesh K] address comments
ee387ae [Prabeesh K] Fix assembly jar location of mqtt-assembly
3f4df12 [Prabeesh K] updated version
b34c3c1 [prabs] adress comments
3aa7fff [prabs] Added Python streaming mqtt word count example
b7d42ff [prabs] Mqtt streaming support in Python
2015-08-10 16:33:23 -07:00
Mahmoud Lababidi d285212756 Fixed AtmoicReference<> Example
Author: Mahmoud Lababidi <lababidi@gmail.com>

Closes #8076 from lababidi/master and squashes the following commits:

af4553b [Mahmoud Lababidi] Fixed AtmoicReference<> Example
2015-08-10 13:02:01 -07:00
Jeff Zhang fe12277b40 Fix doc typo
Straightforward fix on doc typo

Author: Jeff Zhang <zjffdu@apache.org>

Closes #8019 from zjffdu/master and squashes the following commits:

aed6e64 [Jeff Zhang] Fix doc typo
2015-08-06 21:03:47 -07:00