Commit graph

167 commits

Author SHA1 Message Date
Matei Zaharia c8bf4131bc [SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:

* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions

You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.

Author: Matei Zaharia <matei@databricks.com>

Closes #896 from mateiz/1.0-docs and squashes the following commits:

03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
2014-05-30 00:34:33 -07:00
Andrew Or 2ffd1eafd2 [SPARK-1753 / 1773 / 1814] Update outdated docs for spark-submit, YARN, standalone etc.
YARN
- SparkPi was updated to not take in master as an argument; we should update the docs to reflect that.
- The default YARN build guide should be in maven, not sbt.
- This PR also adds a paragraph on steps to debug a YARN application.

Standalone
- Emphasize spark-submit more. Right now it's one small paragraph preceding the legacy way of launching through `org.apache.spark.deploy.Client`.
- The way we set configurations / environment variables according to the old docs is outdated. This needs to reflect changes introduced by the Spark configuration changes we made.

In general, this PR also adds a little more documentation on the new spark-shell, spark-submit, spark-defaults.conf etc here and there.

Author: Andrew Or <andrewor14@gmail.com>

Closes #701 from andrewor14/yarn-docs and squashes the following commits:

e2c2312 [Andrew Or] Merge in changes in #752 (SPARK-1814)
25cfe7b [Andrew Or] Merge in the warning from SPARK-1753
a8c39c5 [Andrew Or] Minor changes
336bbd9 [Andrew Or] Tabs -> spaces
4d9d8f7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
041017a [Andrew Or] Abstract Spark submit documentation to cluster-overview.html
3cc0649 [Andrew Or] Detail how to set configurations + remove legacy instructions
5b7140a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
85a51fc [Andrew Or] Update run-example, spark-shell, configuration etc.
c10e8c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
381fe32 [Andrew Or] Update docs for standalone mode
757c184 [Andrew Or] Add a note about the requirements for the debugging trick
f8ca990 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
924f04c [Andrew Or] Revert addition of --deploy-mode
d5fe17b [Andrew Or] Update the YARN docs
2014-05-12 19:44:14 -07:00
Patrick Wendell 06b15baab2 SPARK-1565 (Addendum): Replace run-example with spark-submit.
Gives a nicely formatted message to the user when `run-example` is run to
tell them to use `spark-submit`.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #704 from pwendell/examples and squashes the following commits:

1996ee8 [Patrick Wendell] Feedback form Andrew
3eb7803 [Patrick Wendell] Suggestions from TD
2474668 [Patrick Wendell] SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.
2014-05-08 22:26:36 -07:00
Sandy Ryza 2b961d8807 SPARK-1492. Update Spark YARN docs to use spark-submit
Author: Sandy Ryza <sandy@cloudera.com>

Closes #601 from sryza/sandy-spark-1492 and squashes the following commits:

5df1634 [Sandy Ryza] Address additional comments from Patrick.
be46d1f [Sandy Ryza] Address feedback from Marcelo and Patrick
867a3ea [Sandy Ryza] SPARK-1492. Update Spark YARN docs to use spark-submit
2014-05-02 21:42:58 -07:00
Thomas Graves 0058b5d2c7 SPARK-1408 Modify Spark on Yarn to point to the history server when app ...
...finishes

Note this is dependent on https://github.com/apache/spark/pull/204 to have a working history server, but there are no code dependencies.

This also fixes SPARK-1288 yarn stable finishApplicationMaster incomplete. Since I was in there I made the diagnostic message be passed properly.

Author: Thomas Graves <tgraves@apache.org>

Closes #362 from tgravescs/SPARK-1408 and squashes the following commits:

ec89705 [Thomas Graves] Fix typo.
446122d [Thomas Graves] Make config yarn specific
f5d5373 [Thomas Graves] SPARK-1408 Modify Spark on Yarn to point to the history server when app finishes
2014-04-17 16:36:37 -05:00
Sandy Ryza 564f1c137c SPARK-1376. In the yarn-cluster submitter, rename "args" option to "arg"
Author: Sandy Ryza <sandy@cloudera.com>

Closes #279 from sryza/sandy-spark-1376 and squashes the following commits:

d8aebfa [Sandy Ryza] SPARK-1376. In the yarn-cluster submitter, rename "args" option to "arg"
2014-04-01 08:26:31 +05:30
Sandy Ryza 1617816090 SPARK-1126. spark-app preliminary
This is a starting version of the spark-app script for running compiled binaries against Spark.  It still needs tests and some polish.  The only testing I've done so far has been using it to launch jobs in yarn-standalone mode against a pseudo-distributed cluster.

This leaves out the changes required for launching python scripts.  I think it might be best to save those for another JIRA/PR (while keeping to the design so that they won't require backwards-incompatible changes).

Author: Sandy Ryza <sandy@cloudera.com>

Closes #86 from sryza/sandy-spark-1126 and squashes the following commits:

d428d85 [Sandy Ryza] Commenting, doc, and import fixes from Patrick's comments
e7315c6 [Sandy Ryza] Fix failing tests
34de899 [Sandy Ryza] Change --more-jars to --jars and fix docs
299ddca [Sandy Ryza] Fix scalastyle
a94c627 [Sandy Ryza] Add newline at end of SparkSubmit
04bc4e2 [Sandy Ryza] SPARK-1126. spark-submit script
2014-03-29 14:41:36 -07:00
Sandy Ryza 698373211e SPARK-1183. Don't use "worker" to mean executor
Author: Sandy Ryza <sandy@cloudera.com>

Closes #120 from sryza/sandy-spark-1183 and squashes the following commits:

5066a4a [Sandy Ryza] Remove "worker" in a couple comments
0bd1e46 [Sandy Ryza] Remove --am-class from usage
bfc8fe0 [Sandy Ryza] Remove am-class from doc and fix yarn-alpha
607539f [Sandy Ryza] Address review comments
74d087a [Sandy Ryza] SPARK-1183. Don't use "worker" to mean executor
2014-03-13 12:11:33 -07:00
Sandy Ryza 328c73d037 SPARK-1197. Change yarn-standalone to yarn-cluster and fix up running on YARN docs
This patch changes "yarn-standalone" to "yarn-cluster" (but still supports the former).  It also cleans up the Running on YARN docs and adds a section on how to view logs.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #95 from sryza/sandy-spark-1197 and squashes the following commits:

563ef3a [Sandy Ryza] Review feedback
6ad06d4 [Sandy Ryza] Change yarn-standalone to yarn-cluster and fix up running on YARN docs
2014-03-06 17:12:58 -08:00
Sandy Ryza b8a1871953 SPARK-1053. Don't require SPARK_YARN_APP_JAR
It looks this just requires taking out the checks.

I verified that, with the patch, I was able to run spark-shell through yarn without setting the environment variable.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #553 from sryza/sandy-spark-1053 and squashes the following commits:

b037676 [Sandy Ryza] SPARK-1053.  Don't require SPARK_YARN_APP_JAR
2014-02-26 10:00:02 -06:00
CodingCat 7b012c9397 [SPARK-1105] fix site scala version error in docs
https://spark-project.atlassian.net/browse/SPARK-1105

fix site scala version error

Author: CodingCat <zhunansjtu@gmail.com>

Closes #618 from CodingCat/doc_version and squashes the following commits:

39bb8aa [CodingCat] more fixes
65bedb0 [CodingCat] fix site scala version error in doc
2014-02-19 15:54:03 -08:00
Sandy Ryza adf42611f1 Incorporate Tom's comments - update doc and code to reflect that core requests may not always be honored 2014-01-21 00:38:02 -08:00
Sandy Ryza 3e85b87d90 SPARK-1033. Ask for cores in Yarn container requests 2014-01-20 14:42:32 -08:00
Thomas Graves c617083e47 yarn-client addJar fix and misc other 2014-01-09 10:24:35 -06:00
Thomas Graves 6eef78d769 Merge pull request #345 from colorant/yarn
support distributing extra files to worker for yarn client mode

So that user doesn't need to package all dependency into one assemble jar as spark app jar
2014-01-08 08:49:20 -06:00
Raymond Liu 67af803136 Export --file for YarnClient mode to support sending extra files to worker on yarn cluster 2014-01-07 10:24:11 +08:00
Holden Karau d86dc74d79 Code review feedback 2014-01-05 22:05:30 -08:00
Patrick Wendell 604fad9c39 Merge remote-tracking branch 'apache-github/master' into remove-binaries
Conflicts:
	core/src/test/scala/org/apache/spark/DriverSuite.scala
	docs/python-programming-guide.md
2014-01-03 21:29:33 -08:00
Patrick Wendell 4ae101ff38 Merge pull request #317 from ScrapCodes/spark-915-segregate-scripts
Spark-915 segregate scripts
2014-01-03 11:24:35 -08:00
Prashant Sharma 74ba97fcf7 sbin/spark-class* -> bin/spark-class* 2014-01-03 15:08:01 +05:30
Prashant Sharma 94f2fffa23 fixed review comments 2014-01-03 14:43:37 +05:30
Raymond Liu f442afc22e fix docs for yarn 2014-01-03 14:14:35 +08:00
Raymond Liu 7815a3ace9 Update maven build documentation 2014-01-03 12:12:38 +08:00
Raymond Liu be343d2a56 Fix yarn/README.md and update docs/running-on-yarn.md 2014-01-03 12:12:38 +08:00
Prashant Sharma 94b7a7fe37 run-example -> bin/run-example 2014-01-02 18:41:21 +05:30
Prashant Sharma b810a85cdd spark-shell -> bin/spark-shell 2014-01-02 18:37:40 +05:30
Prashant Sharma 980afd280a Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into spark-915-segregate-scripts
Conflicts:
	bin/spark-shell
	core/pom.xml
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
	core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala
	core/src/test/scala/org/apache/spark/DriverSuite.scala
	python/run-tests
	sbin/compute-classpath.sh
	sbin/spark-class
	sbin/stop-slaves.sh
2014-01-02 17:55:21 +05:30
Prashant Sharma 6be4c11194 Removed sbt folder and changed docs accordingly 2014-01-02 14:09:37 +05:30
Patrick Wendell 1291dd4dce Fix list rendering in YARN markdown docs. 2013-12-10 16:38:33 -08:00
Patrick Wendell 41c60b337a Various broken links in documentation 2013-12-07 22:31:44 -08:00
Ali Ghodsi e2c2914faa more docs 2013-12-06 16:54:06 -08:00
Ali Ghodsi f2fb4b4228 Updated documentation about the YARN v2.2 build process 2013-12-06 16:31:26 -08:00
Raymond Liu ab3cefde53 Add YarnClientClusterScheduler and Backend.
With this scheduler, the user application is launched locally,
While the executor will be launched by YARN on remote nodes.

This enables spark-shell to run upon YARN.
2013-11-22 09:23:27 +08:00
tgravescs 4093e9393a Impove Spark on Yarn Error handling 2013-11-19 12:44:00 -06:00
tgravescs a35472e1dd Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs. 2013-11-04 16:16:28 -06:00
Matei Zaharia 8f11c36fe1 Merge remote-tracking branch 'tgravescs/sparkYarnDistCache'
Closes #11

Conflicts:
	docs/running-on-yarn.md
	yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
2013-10-10 19:34:33 -07:00
tgravescs 0fff4ee852 Adding in the --addJars option to make SparkContext.addJar work on yarn and cleanup
the classpaths
2013-10-03 11:52:16 -05:00
tgravescs bc3b20abdc Allow users to set the application name for Spark on Yarn 2013-10-02 12:54:17 -05:00
Y.CORP.YAHOO.COM\tgraves 9d4246863a Support distributed cache files and archives on spark on yarn and attempt to cleanup the staging directory on exit 2013-09-23 09:09:59 -05:00
shane-huang dfbdc9ddb7 added spark-class and spark-executor to sbin
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-09-23 11:28:58 +08:00
Jey Kottalam 35ed09f1d1 Clarify YARN example 2013-09-06 11:31:16 -07:00
Y.CORP.YAHOO.COM\tgraves c8cc276110 Review comment changes and update to org.apache packaging 2013-09-03 10:50:21 -05:00
Y.CORP.YAHOO.COM\tgraves 547fc4a412 Merge remote-tracking branch 'mesos/master' into yarnUILink
Conflicts:
	core/src/main/scala/org/apache/spark/ui/UIUtils.scala
	core/src/main/scala/org/apache/spark/ui/jobs/PoolTable.scala
	core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala
	docs/running-on-yarn.md
2013-09-03 08:36:59 -05:00
Y.CORP.YAHOO.COM\tgraves 96452eea56 fix up minor things 2013-08-30 16:04:31 -05:00
Y.CORP.YAHOO.COM\tgraves bac46266a9 Link the Spark UI to the Yarn UI 2013-08-30 15:55:32 -05:00
Matei Zaharia f3a964848d More doc improvements + better warnings when you haven't built Spark 2013-08-30 12:41:25 -07:00
Matei Zaharia 53cd50c069 Change build and run instructions to use assemblies
This commit makes Spark invocation saner by using an assembly JAR to
find all of Spark's dependencies instead of adding all the JARs in
lib_managed. It also packages the examples into an assembly and uses
that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
with two better-named scripts: "run-examples" for examples, and
"spark-class" for Spark internal classes (e.g. REPL, master, etc). This
is also designed to minimize the confusion people have in trying to use
"run" to run their own classes; it's not meant to do that, but now at
least if they look at it, they can modify run-examples to do a decent
job for them.

As part of this, Bagel's examples are also now properly moved to the
examples package instead of bagel.
2013-08-29 21:19:04 -07:00
Matei Zaharia baa84e7e4c Merge pull request #865 from tgravescs/fixtmpdir
Spark on Yarn should use yarn approved directories for spark.local.dir and tmp
2013-08-28 12:44:46 -07:00
Y.CORP.YAHOO.COM\tgraves 63dc635de6 fix typos 2013-08-26 17:06:20 -05:00
Y.CORP.YAHOO.COM\tgraves c9464c74a1 Add ability for user to specify environment variables 2013-08-26 16:44:27 -05:00
Y.CORP.YAHOO.COM\tgraves 6dd64e8bb2 Update docs and remove old reference to --user option 2013-08-26 14:29:24 -05:00
Jey Kottalam 6585f49841 Update build docs 2013-08-21 14:51:56 -07:00
Jey Kottalam 14b6bcdf93 update YARN docs 2013-08-15 16:50:37 -07:00
Reynold Xin 0eab7a78b9 Fixed a couple typos and formating problems in the YARN documentation. 2013-05-17 18:05:46 -07:00
Mridul Muralidharan da2642bead Fix example jar name 2013-05-17 06:58:46 +05:30
Mridul Muralidharan f16c781709 Fix documentation to use yarn-standalone as master 2013-05-16 17:50:22 +05:30
Mridul Muralidharan 87540a7b38 Fix running on yarn documentation 2013-05-16 15:27:58 +05:30
Mridul Muralidharan ee37612bc9 1) Add support for HADOOP_CONF_DIR (and/or YARN_CONF_DIR - use either) : which is used to specify the client side configuration directory : which needs to be part of the CLASSPATH.
2) Move from var+=".." to var="$var.." : the former does not work on older bash shells unfortunately.
2013-05-11 11:12:22 +05:30
Mridul Muralidharan ac2e8e8720 Add some basic documentation 2013-04-19 00:13:19 +05:30
Andy Konwinski c9097628fc Fix broken link to YARN documentation. 2013-03-13 14:51:13 -07:00
Patrick Wendell 8321e7f0c2 Fixing YARN instructions 2012-10-09 22:39:28 -07:00
Andy Konwinski e1a724f39c Updating lots of docs to use the new special version number variables,
also adding the version to the navbar so it is easy to tell which
version of Spark these docs were compiled for.
2012-10-08 17:17:17 -07:00
Matei Zaharia bf18e0994e Minor typos 2012-09-26 23:53:38 -07:00
Matei Zaharia ea05fc130b Updates to standalone cluster, web UI and deploy docs. 2012-09-26 22:54:39 -07:00
Matei Zaharia d51d5e0582 Doc fixes 2012-09-25 23:59:04 -07:00
Matei Zaharia f1246cc7c1 Various enhancements to the programming guide and HTML/CSS 2012-09-25 23:26:56 -07:00
Denny 6d53b971b9 Added standalone and YARN docs. Merged standalone cluster into standalone doc 2012-09-13 09:47:54 -07:00