Commit graph

46 commits

Author SHA1 Message Date
Matei Zaharia f10de042b8 Add language tabs and Python version to interactive part of quick-start
This is an addition of some stuff that was missed in https://issues.apache.org/jira/browse/SPARK-1567. I've also updated the doc to show submitting the Python application with spark-submit.

Author: Matei Zaharia <matei@databricks.com>

Closes #782 from mateiz/spark-1567-extra and squashes the following commits:

6f8f2aa [Matei Zaharia] tweaks
9ed9874 [Matei Zaharia] tweaks
ae67c3e [Matei Zaharia] tweak
b303ba3 [Matei Zaharia] tweak
1433a4d [Matei Zaharia] Add language tabs and Python version to interactive part of quick-start guide
2014-05-14 21:45:20 -07:00
Andrew Or 2ffd1eafd2 [SPARK-1753 / 1773 / 1814] Update outdated docs for spark-submit, YARN, standalone etc.
YARN
- SparkPi was updated to not take in master as an argument; we should update the docs to reflect that.
- The default YARN build guide should be in maven, not sbt.
- This PR also adds a paragraph on steps to debug a YARN application.

Standalone
- Emphasize spark-submit more. Right now it's one small paragraph preceding the legacy way of launching through `org.apache.spark.deploy.Client`.
- The way we set configurations / environment variables according to the old docs is outdated. This needs to reflect changes introduced by the Spark configuration changes we made.

In general, this PR also adds a little more documentation on the new spark-shell, spark-submit, spark-defaults.conf etc here and there.

Author: Andrew Or <andrewor14@gmail.com>

Closes #701 from andrewor14/yarn-docs and squashes the following commits:

e2c2312 [Andrew Or] Merge in changes in #752 (SPARK-1814)
25cfe7b [Andrew Or] Merge in the warning from SPARK-1753
a8c39c5 [Andrew Or] Minor changes
336bbd9 [Andrew Or] Tabs -> spaces
4d9d8f7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
041017a [Andrew Or] Abstract Spark submit documentation to cluster-overview.html
3cc0649 [Andrew Or] Detail how to set configurations + remove legacy instructions
5b7140a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
85a51fc [Andrew Or] Update run-example, spark-shell, configuration etc.
c10e8c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
381fe32 [Andrew Or] Update docs for standalone mode
757c184 [Andrew Or] Add a note about the requirements for the debugging trick
f8ca990 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
924f04c [Andrew Or] Revert addition of --deploy-mode
d5fe17b [Andrew Or] Update the YARN docs
2014-05-12 19:44:14 -07:00
Patrick Wendell 7b978c1ac5 Fix two download suggestions in the docs:
1) On the quick start page provide a direct link to the downloads (suggested by @pbailis).
2) On the index page, don't suggest users always have to build Spark, since many won't.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #662 from pwendell/quick-start and squashes the following commits:

0622f27 [Patrick Wendell] Fix two download suggestions in the docs:
2014-05-06 12:07:46 -07:00
Patrick Wendell aa9a7f5db7 SPARK-1606: Infer user application arguments instead of requiring --arg.
This modifies spark-submit to do something more like the Hadoop `jar`
command. Now we have the following syntax:

./bin/spark-submit [options] user.jar [user options]

Author: Patrick Wendell <pwendell@gmail.com>

Closes #563 from pwendell/spark-submit and squashes the following commits:

32241fc [Patrick Wendell] Review feedback
3adfb69 [Patrick Wendell] Small fix
bc48139 [Patrick Wendell] SPARK-1606: Infer user application arguments instead of requiring --arg.
2014-04-26 19:24:29 -07:00
Matei Zaharia fc78384704 [SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs
I used the sbt-unidoc plugin (https://github.com/sbt/sbt-unidoc) to create a unified Scaladoc of our public packages, and generate Javadocs as well. One limitation is that I haven't found an easy way to exclude packages in the Javadoc; there is a SBT task that identifies Java sources to run javadoc on, but it's been very difficult to modify it from outside to change what is set in the unidoc package. Some SBT-savvy people should help with this. The Javadoc site also lacks package-level descriptions and things like that, so we may want to look into that. We may decide not to post these right now if it's too limited compared to the Scala one.

Example of the built doc site: http://people.csail.mit.edu/matei/spark-unified-docs/

Author: Matei Zaharia <matei@databricks.com>

This patch had conflicts when merged, resolved by
Committer: Patrick Wendell <pwendell@gmail.com>

Closes #457 from mateiz/better-docs and squashes the following commits:

a63d4a3 [Matei Zaharia] Skip Java/Scala API docs for Python package
5ea1f43 [Matei Zaharia] Fix links to Java classes in Java guide, fix some JS for scrolling to anchors on page load
f05abc0 [Matei Zaharia] Don't include java.lang package names
995e992 [Matei Zaharia] Skip internal packages and class names with $ in JavaDoc
a14a93c [Matei Zaharia] typo
76ce64d [Matei Zaharia] Add groups to Javadoc index page, and a first package-info.java
ed6f994 [Matei Zaharia] Generate JavaDoc as well, add titles, update doc site to use unified docs
acb993d [Matei Zaharia] Add Unidoc plugin for the projects we want Unidoced
2014-04-21 21:57:40 -07:00
Patrick Wendell fb98488fc8 Clean up and simplify Spark configuration
Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:

1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
3. Adds ability to set these same variables for the driver using `spark-submit`.
4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #299 from pwendell/config-cleanup and squashes the following commits:

127f301 [Patrick Wendell] Improvements to testing
a006464 [Patrick Wendell] Moving properties file template.
b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
0086939 [Patrick Wendell] Minor style fixes
af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
af0adf7 [Patrick Wendell] Automatically add user jar
a56b125 [Patrick Wendell] Responses to Tom's review
d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
a762901 [Patrick Wendell] Fixing test failures
ffa00fe [Patrick Wendell] Review feedback
fda0301 [Patrick Wendell] Note
308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
c2a2909 [Patrick Wendell] Test compile fixes
4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
b08893b [Patrick Wendell] Additional improvements.
ace4ead [Patrick Wendell] Responses to review feedback.
b72d183 [Patrick Wendell] Review feedback for spark env file
46555c1 [Patrick Wendell] Review feedback and import clean-ups
437aed1 [Patrick Wendell] Small fix
761ebcd [Patrick Wendell] Library path and classpath for drivers
7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
5b0ba8e [Patrick Wendell] Don't ship executor envs
84cc5e5 [Patrick Wendell] Small clean-up
1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
6eaf7d0 [Patrick Wendell] executorJavaOpts
0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
2014-04-21 10:26:33 -07:00
Prabeesh K 0acc7a02b4 small fix ( proogram -> program )
Author: Prabeesh K <prabsmails@gmail.com>

Closes #331 from prabeesh/patch-3 and squashes the following commits:

9399eb5 [Prabeesh K] small fix(proogram -> program)
2014-04-04 21:32:00 -07:00
CodingCat 7b012c9397 [SPARK-1105] fix site scala version error in docs
https://spark-project.atlassian.net/browse/SPARK-1105

fix site scala version error

Author: CodingCat <zhunansjtu@gmail.com>

Closes #618 from CodingCat/doc_version and squashes the following commits:

39bb8aa [CodingCat] more fixes
65bedb0 [CodingCat] fix site scala version error in doc
2014-02-19 15:54:03 -08:00
Holden Karau d86dc74d79 Code review feedback 2014-01-05 22:05:30 -08:00
Patrick Wendell 604fad9c39 Merge remote-tracking branch 'apache-github/master' into remove-binaries
Conflicts:
	core/src/test/scala/org/apache/spark/DriverSuite.scala
	docs/python-programming-guide.md
2014-01-03 21:29:33 -08:00
Prashant Sharma b4bb80002b Merge branch 'master' into spark-1002-remove-jars 2014-01-03 12:12:04 +05:30
Prashant Sharma a3f90a2ecf pyspark -> bin/pyspark 2014-01-02 18:50:12 +05:30
Prashant Sharma b810a85cdd spark-shell -> bin/spark-shell 2014-01-02 18:37:40 +05:30
Prashant Sharma 6be4c11194 Removed sbt folder and changed docs accordingly 2014-01-02 14:09:37 +05:30
Matei Zaharia 0fa5809768 Updated docs for SparkConf and handled review comments 2013-12-30 22:17:28 -05:00
Matei Zaharia b458854977 Fix some review comments 2013-09-08 21:25:49 -07:00
Matei Zaharia 651a96adf7 More fair scheduler docs and property names.
Also changed uses of "job" terminology to "application" when they
referred to an entire Spark program, to avoid confusion.
2013-09-08 00:29:11 -07:00
Matei Zaharia 0a8cc30921 Move some classes to more appropriate packages:
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia 4f422032e5 Update docs for new package 2013-09-01 14:13:15 -07:00
Patrick Wendell 0e375a3cc2 Add assmebly plug in links 2013-09-01 09:43:42 -07:00
Patrick Wendell 6371febe18 Better docs 2013-08-31 19:09:06 -07:00
Matei Zaharia 9ddad0dcb4 Fixes suggested by Patrick 2013-08-31 17:40:33 -07:00
Matei Zaharia 4293533032 Update docs about HDFS versions 2013-08-30 15:04:43 -07:00
Matei Zaharia 2de756ff19 Update some build instructions because only sbt assembly and mvn package
are now needed
2013-08-29 21:19:06 -07:00
Patrick Wendell a72134a6ac SPARK-739 Have quickstart standlone job use README 2013-04-25 10:39:28 -07:00
Andy Konwinski 60a91b3b59 Update quick-start.md heading on Operations (not just Transformations). 2013-04-12 12:34:51 -07:00
Andrew Ash 4e2c965383 Don't use deprecated Application in example
As of 2.9.0 extending from Application is not recommended

http://www.scala-lang.org/api/2.9.3/index.html#scala.Application
2013-03-28 17:47:37 -03:00
Andy Konwinski cf73fbd305 Fix another broken link in quick start. 2013-03-13 02:23:44 -07:00
Andy Konwinski b63109763b Fix broken link in Quick Start. 2013-03-13 02:02:34 -07:00
Matei Zaharia 9a046e30ac Switch docs to use Akka repo instead of Typesafe 2013-02-25 22:18:47 -08:00
Matei Zaharia fbb3fc4143 Merge pull request #346 from JoshRosen/python-api
Python API (PySpark)
2013-01-12 23:49:36 -08:00
Michael Heuer 480c4139bb add repositories section to simple job pom.xml 2013-01-11 11:40:17 -06:00
Josh Rosen ce9f1bbe20 Add pyspark script to replace the other scripts.
Expand the PySpark programming guide.
2013-01-01 21:25:49 -08:00
Josh Rosen c5cee53f20 Merge remote-tracking branch 'origin/master' into python-api
Conflicts:
	docs/quick-start.md
2012-12-29 16:00:51 -08:00
Josh Rosen c2b105af34 Add documentation for Python API. 2012-12-28 22:51:28 -08:00
Patrick Wendell 6ceb559994 Adding multi-jar constructor in quickstart 2012-11-27 23:32:24 -08:00
Josh Rosen c4aa10154e Fix minor typos in quick start guide. 2012-10-23 13:49:52 -07:00
Patrick Wendell 7a03a0e35d Adding dependency repos in quickstart example 2012-10-14 11:48:24 -07:00
Patrick Wendell 4de5cc1ad4 Removing reference to publish-local in the quickstart 2012-10-09 22:39:29 -07:00
Matei Zaharia bc0bc672d0 Updates to documentation:
- Edited quick start and tuning guide to simplify them a little
- Simplified top menu bar
- Made private a SparkContext constructor parameter that was left as
  public
- Various small fixes
2012-10-09 14:30:23 -07:00
Andy Konwinski e1a724f39c Updating lots of docs to use the new special version number variables,
also adding the version to the navbar so it is easy to tell which
version of Spark these docs were compiled for.
2012-10-08 17:17:17 -07:00
Andy Konwinski 89f8e1c2e7 Merge remote-tracking branch 'public-spark/dev' into add-version-vars-to-docs
Conflicts:
	docs/quick-start.md
2012-10-08 12:15:38 -07:00
Andy Konwinski 45d03231d0 Adds liquid variables to docs templating system so that they can be used
throughout the docs: SPARK_VERSION, SCALA_VERSION, and MESOS_VERSION.

To use them, e.g. use {{site.SPARK_VERSION}}.

Also removes uses of {{HOME_PATH}} which were being resolved to ""
by the templating system anyway.
2012-10-08 10:30:38 -07:00
Patrick Wendell 99aac2f6c4 Removing one link in quickstart 2012-10-08 08:53:27 -07:00
Patrick Wendell 35b767f478 Responding to Matei's comments 2012-10-02 23:54:03 -07:00
Patrick Wendell f78edf94cf A Spark "Quick Start" example
This commit includes a quick start example that covers:
1) Basic usage of the Spark shell
2) A simple Spark job in Scala
3) A simple Spark job in Java
2012-10-01 17:24:09 -07:00