Fixed some places in cluster-overview that are obsolete (i.e. not mentioning Kubernetes), and also fixed the Yarn spark-submit sample command in submitting-applications.
### What changes were proposed in this pull request?
This is to fix the docs in "Cluster Overview" and "Submitting Applications" for places where Kubernetes is missed (mostly due to obsolete docs that haven't got updated) and where Yarn sample spark-submit command is incorrectly written.
### Why are the changes needed?
To help the Spark users who uses Kubernetes as cluster manager to have a correct idea when reading the "Cluster Overview" doc page. Also to make the sample spark-submit command for Yarn actually runnable in the "Submitting Applications" doc page, by removing the invalid comment after line continuation char `\`.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
No test, as this is doc fix.
Closes#32701 from huskysun/doc-fix.
Authored-by: Shiqi Sun <s.sun@salesforce.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
While explaining pySpark usage, use of repeated synonymous words were causing confusion.
Removed "instead of a JAR" word, to keep it more readable.
### Why are the changes needed?
To keep the docs more readable and easy to understand.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
No code changes, minor documentation change only. No tests added.
Closes#29956 from manubatham20/patch-1.
Authored-by: manubatham20 <manubatham2006@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
## What changes were proposed in this pull request?
Add AL2 license to metadata of all .md files.
This seemed to be the tidiest way as it will get ignored by .md renderers and other tools. Attempts to write them as markdown comments revealed that there is no such standard thing.
## How was this patch tested?
Doc build
Closes#24243 from srowen/SPARK-26918.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request?
While submitting the spark application, passing multiple configurations not documented clearly, no examples given.it will be better if it can be documented since clarity is less from spark documentation side.
Even when i was browsing i could see few queries raised by users, below provided the reference.
https://community.hortonworks.com/questions/105022/spark-submit-multiple-configurations.html
As part of fixing i had documented the above scenario with an example.
## How was this patch tested?
Manual inspection of the updated document.
Closes#24191 from sujith71955/master_conf.
Authored-by: s71955 <sujithchacko.2010@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Easy fix in the documentation.
## How was this patch tested?
N/A
Closes#20948
Author: Daniel Sakuma <dsakuma@gmail.com>
Closes#20928 from dsakuma/fix_typo_configuration_docs.
## What changes were proposed in this pull request?
Fix spelling in quick-start doc.
## How was this patch tested?
Doc only.
Author: Shashwat Anand <me@shashwat.me>
Closes#20336 from ashashwat/SPARK-23165.
What changes were proposed in this pull request?
This PR contains documentation on the usage of Kubernetes scheduler in Spark 2.3, and a shell script to make it easier to build docker images required to use the integration. The changes detailed here are covered by https://github.com/apache/spark/pull/19717 and https://github.com/apache/spark/pull/19468 which have merged already.
How was this patch tested?
The script has been in use for releases on our fork. Rest is documentation.
cc rxin mateiz (shepherd)
k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko
reviewers: vanzin felixcheung jiangxb1987 mridulm
TODO:
- [x] Add dockerfiles directory to built distribution. (https://github.com/apache/spark/pull/20007)
- [x] Change references to docker to instead say "container" (https://github.com/apache/spark/pull/19995)
- [x] Update configuration table.
- [x] Modify spark.kubernetes.allocation.batch.delay to take time instead of int (#20032)
Author: foxish <ramanathana@google.com>
Closes#19946 from foxish/update-k8s-docs.
## What changes were proposed in this pull request?
Add documentation for adding master url in multi host, port format for standalone cluster with high availability with zookeeper.
Referring documentation [Standby Masters with ZooKeeper](http://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper)
## How was this patch tested?
Documenting the functionality already present.
Author: MirrorZ <chandrika3437@gmail.com>
Closes#17584 from MirrorZ/master.
## What changes were proposed in this pull request?
Fix typo in docs
## How was this patch tested?
Author: uncleGen <hustyugm@gmail.com>
Closes#16658 from uncleGen/typo-issue.
## What changes were proposed in this pull request?
core/src/main/scala/org/apache/spark/SparkContext.scala contains LOCAL_N_FAILURES_REGEX master mode, but this was never documented, so do so.
## How was this patch tested?
By using the Github Markdown preview feature.
Author: Maurus Cuelenaere <mcuelenaere@gmail.com>
Closes#16562 from mcuelenaere/patch-1.
## What changes were proposed in this pull request?
Document `user:password` syntax as possible means of specifying credentials for password-protected `--repositories`
## How was this patch tested?
Doc build
Author: Sean Owen <sowen@cloudera.com>
Closes#15584 from srowen/SPARK-17898.
Docs change to remove the sentence about Mesos not supporting cluster mode.
It was not.
Author: Michael Gummelt <mgummelt@mesosphere.io>
Closes#12249 from mgummelt/fix-mesos-cluster-docs.
## What changes were proposed in this pull request?
When studying spark many users just copy examples on the documentation and paste on their terminals
and because of that the missing backlashes lead them run into some shell errors.
The added backslashes avoid that problem for spark users with that behavior.
## How was this patch tested?
I generated the documentation locally using jekyll and checked the generated pages
Author: Daniel Santana <mestresan@gmail.com>
Closes#11699 from danielsan/master.
this is stated for --packages and --repositories. Without stating it for --jars, people expect a standard java classpath to work, with expansion and using a different delimiter than a comma. Currently this is only state in the --help for spark-submit "Comma-separated list of local jars to include on the driver and executor classpaths."
Author: James Lohse <jimlohse@users.noreply.github.com>
Closes#10890 from jimlohse/patch-1.
Adding more documentation about submitting jobs with mesos cluster mode.
Author: Timothy Chen <tnachen@gmail.com>
Closes#10086 from tnachen/mesos_supervise_docs.
Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs.
Follow-on to https://github.com/apache/spark/pull/8385
CC nssalian
Author: Sean Owen <sowen@cloudera.com>
Closes#8968 from srowen/SPARK-9570.
Links work now properly + consistent use of *Spark standalone cluster* (Spark uppercase + lowercase the rest -- seems agreed in the other places in the docs).
Author: Jacek Laskowski <jacek.laskowski@deepsense.io>
Closes#8759 from jaceklaskowski/docs-submitting-apps.
Now PySpark on YARN with cluster mode is supported so let's update doc.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#6040 from sarutak/update-doc-for-pyspark-on-yarn and squashes the following commits:
ad9f88c [Kousuke Saruta] Brushed up sentences
469fd2e [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into update-doc-for-pyspark-on-yarn
fcfdb92 [Kousuke Saruta] Updated doc for PySpark on YARN with cluster mode
...R
https://issues.apache.org/jira/browse/SPARK-6426
Author: WangTaoTheTonic <wangtao111@huawei.com>
Closes#5103 from WangTaoTheTonic/SPARK-6426 and squashes the following commits:
e6dd78d [WangTaoTheTonic] User could also point the yarn cluster config directory via YARN_CONF_DIR
This is a trivial addendum to SPARK-4506, which was already resolved. noted by Asim Jalis in SPARK-4506.
Author: Sean Owen <sowen@cloudera.com>
Closes#4160 from srowen/SPARK-4506 and squashes the following commits:
5f5f7df [Sean Owen] Update more docs to reflect that standalone works in cluster mode
Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Closes#3772 from nchammas/patch-1 and squashes the following commits:
b7d9083 [Nicholas Chammas] [Docs] Minor typo fixes
Important updates to the streaming programming guide
- Make the fault-tolerance properties easier to understand, with information about write ahead logs
- Update the information about deploying the spark streaming app with information about Driver HA
- Update Receiver guide to discuss reliable vs unreliable receivers.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Author: Josh Rosen <joshrosen@databricks.com>
Author: Josh Rosen <rosenville@gmail.com>
Closes#3653 from tdas/streaming-doc-update-1.2 and squashes the following commits:
f53154a [Tathagata Das] Addressed Josh's comments.
ce299e4 [Tathagata Das] Minor update.
ca19078 [Tathagata Das] Minor change
f746951 [Tathagata Das] Mentioned performance problem with WAL
7787209 [Tathagata Das] Merge branch 'streaming-doc-update-1.2' of github.com:tdas/spark into streaming-doc-update-1.2
2184729 [Tathagata Das] Updated Kafka and Flume guides with reliability information.
2f3178c [Tathagata Das] Added more information about writing reliable receivers in the custom receiver guide.
91aa5aa [Tathagata Das] Improved API Docs menu
5707581 [Tathagata Das] Added Pythn API badge
b9c8c24 [Tathagata Das] Merge pull request #26 from JoshRosen/streaming-programming-guide
b8c8382 [Josh Rosen] minor fixes
a4ef126 [Josh Rosen] Restructure parts of the fault-tolerance section to read a bit nicer when skipping over the headings
65f66cd [Josh Rosen] Fix broken link to fault-tolerance semantics section.
f015397 [Josh Rosen] Minor grammar / pluralization fixes.
3019f3a [Josh Rosen] Fix minor Markdown formatting issues
aa8bb87 [Tathagata Das] Small update.
195852c [Tathagata Das] Updated based on Josh's comments, updated receiver reliability and deploying section, and also updated configuration.
17b99fb [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-doc-update-1.2
a0217c0 [Tathagata Das] Changed Deploying menu layout
67fcffc [Tathagata Das] Added cluster mode + supervise example to submitting application guide.
e45453b [Tathagata Das] Update streaming guide, added deploying section.
192c7a7 [Tathagata Das] Added more info about Python API, and rewrote the checkpointing section.
...spark-submit
The PR allows invocations like
spark-submit --class org.MyClass --spark.shuffle.spill false myjar.jar
Author: Sandy Ryza <sandy@cloudera.com>
Closes#1253 from sryza/sandy-spark-2310 and squashes the following commits:
1dc9855 [Sandy Ryza] More doc and cleanup
00edfb9 [Sandy Ryza] Review comments
91b244a [Sandy Ryza] Change format to --conf PROP=VALUE
8fabe77 [Sandy Ryza] SPARK-2310. Support arbitrary Spark properties on the command line with spark-submit
The existing docs are highly misleading. For standalone mode, for example, it encourages the user to use standalone-cluster mode, which is not officially supported. The safeguards have been added in Spark submit itself to prevent bad documentation from leading users down the wrong path in the future.
This PR is prompted by countless headaches users of Spark have run into on the mailing list.
Author: Andrew Or <andrewor14@gmail.com>
Closes#1200 from andrewor14/submit-docs and squashes the following commits:
5ea2460 [Andrew Or] Rephrase cluster vs client explanation
c827f32 [Andrew Or] Clarify spark submit messages
9f7ed8f [Andrew Or] Clarify client vs cluster deploy mode + add safeguards
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
Author: Matei Zaharia <matei@databricks.com>
Closes#896 from mateiz/1.0-docs and squashes the following commits:
03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0