spark-instrumented-optimizer/docs
Imran Rashid ff8b449958 [SPARK-3454] separate json endpoints for data in the UI
Exposes data available in the UI as json over http.  Key points:

* new endpoints, handled independently of existing XyzPage classes.  Root entrypoint is `JsonRootResource`
* Uses jersey + jackson for routing & converting POJOs into json
* tests against known results in `HistoryServerSuite`
* also fixes some minor issues w/ the UI -- synchronizing on access to `StorageListener` & `StorageStatusListener`, and fixing some inconsistencies w/ the way we handle retained jobs & stages.

Author: Imran Rashid <irashid@cloudera.com>

Closes #4435 from squito/SPARK-3454 and squashes the following commits:

da1e35f [Imran Rashid] typos etc.
5e78b4f [Imran Rashid] fix rendering problems
5ae02ad [Imran Rashid] Merge branch 'master' into SPARK-3454
f016182 [Imran Rashid] change all constructors json-pojo class constructors to be private[spark] to protect us from mima-false-positives if we add fields
3347b72 [Imran Rashid] mark EnumUtil as @Private
ec140a2 [Imran Rashid] create @Private
cc1febf [Imran Rashid] add docs on the metrics-as-json api
cbaf287 [Imran Rashid] Merge branch 'master' into SPARK-3454
56db31e [Imran Rashid] update tests for mulit-attempt
7f3bc4e [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt"
67008b4 [Imran Rashid] rats
9e51400 [Imran Rashid] style
c9bae1c [Imran Rashid] handle multiple attempts per app
b87cd63 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt
188762c [Imran Rashid] multi-attempt
2af11e5 [Imran Rashid] Merge branch 'master' into SPARK-3454
befff0c [Imran Rashid] review feedback
14ac3ed [Imran Rashid] jersey-core needs to be explicit; move version & scope to parent pom.xml
f90680e [Imran Rashid] Merge branch 'master' into SPARK-3454
dc8a7fe [Imran Rashid] style, fix errant comments
acb7ef6 [Imran Rashid] fix indentation
7bf1811 [Imran Rashid] move MetricHelper so mima doesnt think its exposed; comments
9d889d6 [Imran Rashid] undo some unnecessary changes
f48a7b0 [Imran Rashid] docs
52bbae8 [Imran Rashid] StorageListener & StorageStatusListener needs to synchronize internally to be thread-safe
31c79ce [Imran Rashid] asm no longer needed for SPARK_PREPEND_CLASSES
b2f8b91 [Imran Rashid] @DeveloperApi
2e19be2 [Imran Rashid] lazily convert ApplicationInfo to avoid memory overhead
ba3d9d2 [Imran Rashid] upper case enums
39ac29c [Imran Rashid] move EnumUtil
d2bde77 [Imran Rashid] update error handling & scoping
4a234d3 [Imran Rashid] avoid jersey-media-json-jackson b/c of potential version conflicts
a157a2f [Imran Rashid] style
7bd4d15 [Imran Rashid] delete security test, since it doesnt do anything
a325563 [Imran Rashid] style
a9c5cf1 [Imran Rashid] undo changes superceeded by master
0c6f968 [Imran Rashid] update deps
1ed0d07 [Imran Rashid] Merge branch 'master' into SPARK-3454
4c92af6 [Imran Rashid] style
f2e63ad [Imran Rashid] Merge branch 'master' into SPARK-3454
c22b11f [Imran Rashid] fix compile error
9ea682c [Imran Rashid] go back to good ol' java enums
cf86175 [Imran Rashid] style
d493b38 [Imran Rashid] Merge branch 'master' into SPARK-3454
f05ae89 [Imran Rashid] add in ExecutorSummaryInfo for MiMa :(
101a698 [Imran Rashid] style
d2ef58d [Imran Rashid] revert changes that had HistoryServer refresh the application listing more often
b136e39b [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt"
e031719 [Imran Rashid] fixes from review
1f53a66 [Imran Rashid] style
b4a7863 [Imran Rashid] fix compile error
2c8b7ee [Imran Rashid] rats
1578a4a [Imran Rashid] doc
674f8dc [Imran Rashid] more explicit about total numbers of jobs & stages vs. number retained
9922be0 [Imran Rashid] Merge branch 'master' into stage_distributions
f5a5196 [Imran Rashid] undo removal of renderJson from MasterPage, since there is no substitute yet
db61211 [Imran Rashid] get JobProgressListener directly from UI
fdfc181 [Imran Rashid] stage/taskList
63eb4a6 [Imran Rashid] tests for taskSummary
ad27de8 [Imran Rashid] error handling on quantile values
b2efcaf [Imran Rashid] cleanup, combine stage-related paths into one resource
aaba896 [Imran Rashid] wire up task summary
a4b1397 [Imran Rashid] stage metric distributions
e48ba32 [Imran Rashid] rename
eaf3bbb [Imran Rashid] style
25cd894 [Imran Rashid] if only given day, assume GMT
51eaedb [Imran Rashid] more visibility fixes
9f28b7e [Imran Rashid] ack, more cleanup
99764e1 [Imran Rashid] Merge branch 'SPARK-3454_w_jersey' into SPARK-3454
a61a43c [Imran Rashid] oops, remove accidental checkin
a066055 [Imran Rashid] set visibility on a lot of classes
1f361c8 [Imran Rashid] update rat-excludes
0be5120 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
2382bef [Imran Rashid] switch to using new "enum"
fef6605 [Imran Rashid] some utils for working w/ new "enum" format
dbfc7bf [Imran Rashid] style
b86bcb0 [Imran Rashid] update test to look at one stage attempt
5f9df24 [Imran Rashid] style
7fd156a [Imran Rashid] refactor jsonDiff to avoid code duplication
73f1378 [Imran Rashid] test json; also add test cases for cleaned stages & jobs
97d411f [Imran Rashid] json endpoint for one job
0c96147 [Imran Rashid] better error msgs for bad stageId vs bad attemptId
dddbd29 [Imran Rashid] stages have attempt; jobs are sorted; resource for all attempts for one stage
190c17a [Imran Rashid] StagePage should distinguish no task data, from unknown stage
84cd497 [Imran Rashid] AllJobsPage should still report correct completed & failed job count, even if some have been cleaned, to make it consistent w/ AllStagesPage
36e4062 [Imran Rashid] SparkUI needs to know about startTime, so it can list its own applicationInfo
b4c75ed [Imran Rashid] fix merge conflicts; need to widen visibility in a few cases
e91750a [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
56d2fc7 [Imran Rashid] jersey needs asm for SPARK_PREPEND_CLASSES to work
f7df095 [Imran Rashid] add test for accumulables, and discover that I need update after all
9c0c125 [Imran Rashid] add accumulableInfo
00e9cc5 [Imran Rashid] more style
3377e61 [Imran Rashid] scaladoc
d05f7a9 [Imran Rashid] dont use case classes for status api POJOs, since they have binary compatibility issues
654cecf [Imran Rashid] move all the status api POJOs to one file
b86e2b0 [Imran Rashid] style
18a8c45 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
5598f19 [Imran Rashid] delete some unnecessary code, more to go
56edce0 [Imran Rashid] style
017c755 [Imran Rashid] add in metrics now available
1b78cb7 [Imran Rashid] fix some import ordering
0dc3ea7 [Imran Rashid] if app isnt found, reload apps from FS before giving up
c7d884f [Imran Rashid] fix merge conflicts
0c12b50 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
b6a96a8 [Imran Rashid] compare json by AST, not string
cd37845 [Imran Rashid] switch to using java.util.Dates for times
a4ab5aa [Imran Rashid] add in explicit dependency on jersey 1.9 -- maven wasn't happy before this
4fdc39f [Imran Rashid] refactor case insensitive enum parsing
cba1ef6 [Imran Rashid] add security (maybe?) for metrics json
f0264a7 [Imran Rashid] switch to using jersey for metrics json
bceb3a9 [Imran Rashid] set http response code on error, some testing
e0356b6 [Imran Rashid] put new test expectation files in rat excludes (is this OK?)
b252e7a [Imran Rashid] small cleanup of accidental changes
d1a8c92 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt
4b398d0 [Imran Rashid] expose UI data as json in new endpoints

(cherry picked from commit d49735800d)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
2015-05-05 07:26:12 -05:00
..
_layouts [SPARK-5654] Integrate SparkR 2015-04-08 22:45:40 -07:00
_plugins [SPARK-5654] Integrate SparkR 2015-04-08 22:45:40 -07:00
css [SPARK-1566] consolidate programming guide, and general doc updates 2014-05-30 00:34:33 -07:00
img [SPARK-6343] Doc driver-worker network reqs 2015-04-09 06:37:20 -04:00
js [SPARK-1566] consolidate programming guide, and general doc updates 2014-05-30 00:34:33 -07:00
_config.yml [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT. 2015-03-20 18:43:57 +00:00
api.md [SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs 2014-04-21 21:57:40 -07:00
bagel-programming-guide.md [SPARK-5608] Improve SEO of Spark documentation pages 2015-02-05 11:12:50 -08:00
building-spark.md [SPARK-7302] [DOCS] SPARK building documentation still mentions building for yarn 0.23 2015-05-03 21:22:31 +01:00
cluster-overview.md [SPARK-6343] Doc driver-worker network reqs 2015-04-09 06:37:20 -04:00
configuration.md [SPARK-7255] [STREAMING] [DOCUMENTATION] Added documentation for spark.streaming.kafka.maxRetries 2015-05-02 23:41:14 +01:00
contributing-to-spark.md Work in progress: 2013-09-08 00:29:11 -07:00
ec2-scripts.md [SPARK-6402][DOC] - Remove some refererences to shark in docs and ec2 2015-03-19 08:02:06 -04:00
graphx-programming-guide.md [SPARK-6510][GraphX]: Add Graph#minus method to act as Set#difference 2015-03-26 19:08:09 -07:00
hadoop-third-party-distributions.md [SPARK-7302] [DOCS] SPARK building documentation still mentions building for yarn 0.23 2015-05-03 21:22:31 +01:00
hardware-provisioning.md Change port from 3030 to 4040 2013-09-11 10:01:38 -07:00
index.md [SPARK-5310][Doc] Update SQL Programming Guide to include DataFrames. 2015-03-09 16:16:16 -07:00
java-programming-guide.md [SPARK-1566] consolidate programming guide, and general doc updates 2014-05-30 00:34:33 -07:00
job-scheduling.md [SPARK-4286] Add an external shuffle service that can be run as a daemon. 2015-04-28 12:08:18 -07:00
ml-guide.md [SPARK-6781] [SQL] use sqlContext in python shell 2015-04-08 13:31:45 -07:00
mllib-classification-regression.md [SPARK-5974] [SPARK-5980] [mllib] [python] [docs] Update ML guide with save/load, Python GBT 2015-02-25 16:13:17 -08:00
mllib-clustering.md [SPARK-5987] [MLlib] Save/load for GaussianMixtureModels 2015-03-25 14:45:23 -07:00
mllib-collaborative-filtering.md [SPARK-6257] [PYSPARK] [MLLIB] MLlib API missing items in Recommendation 2015-04-30 23:51:00 -07:00
mllib-data-types.md SPARK-6454 [DOCS] Fix links to pyspark api 2015-03-22 15:56:25 +00:00
mllib-decision-tree.md [SPARK-6097][MLLIB] Support tree model save/load in PySpark/MLlib 2015-03-02 22:27:01 -08:00
mllib-dimensionality-reduction.md SPARK-1307 [DOCS] Don't use term 'standalone' to refer to a Spark Application 2014-10-14 21:37:51 -07:00
mllib-ensembles.md [SPARK-6025] [MLlib] Add helper method evaluateEachIteration to extract learning curve 2015-03-20 17:14:09 -07:00
mllib-feature-extraction.md [SPARK-5912] [docs] [mllib] Small fixes to ChiSqSelector docs 2015-02-23 16:15:57 -08:00
mllib-frequent-pattern-mining.md [SPARK-5900][MLLIB] make PIC and FPGrowth Java-friendly 2015-02-19 18:06:16 -08:00
mllib-guide.md [SPARK-6278][MLLIB] Mention the change of objective in linear regression 2015-03-13 10:27:28 -07:00
mllib-isotonic-regression.md [doc][mllib] Fix typo of the page title in Isotonic regression documents 2015-04-20 00:03:23 -04:00
mllib-linear-methods.md Fixed doc 2015-04-18 17:20:46 -07:00
mllib-migration-guides.md [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release 2015-02-20 02:31:32 -08:00
mllib-naive-bayes.md [SPARK-4894][mllib] Added Bernoulli option to NaiveBayes model in mllib 2015-03-31 11:16:55 -07:00
mllib-optimization.md [SPARK-6336] LBFGS should document what convergenceTol means 2015-03-17 12:11:57 -07:00
mllib-statistics.md SPARK-6454 [DOCS] Fix links to pyspark api 2015-03-22 15:56:25 +00:00
monitoring.md [SPARK-3454] separate json endpoints for data in the UI 2015-05-05 07:26:12 -05:00
programming-guide.md fixed doc 2015-04-20 13:11:21 -07:00
python-programming-guide.md [SPARK-1566] consolidate programming guide, and general doc updates 2014-05-30 00:34:33 -07:00
quick-start.md [SPARK-5608] Improve SEO of Spark documentation pages 2015-02-05 11:12:50 -08:00
README.md [SPARK-5654] Integrate SparkR 2015-04-08 22:45:40 -07:00
running-on-mesos.md [SPARK-2691] [MESOS] Support for Mesos DockerInfo 2015-05-01 18:41:22 -07:00
running-on-yarn.md [SPARK-6653] [YARN] New config to specify port for sparkYarnAM actor system 2015-05-05 11:10:27 +01:00
scala-programming-guide.md [SPARK-1566] consolidate programming guide, and general doc updates 2014-05-30 00:34:33 -07:00
security.md [SPARK-5342] [YARN] Allow long running Spark apps to run on secure YARN/HDFS 2015-05-01 15:32:09 -05:00
spark-standalone.md [SPARK-6552][Deploy][Doc]expose start-slave.sh to user and update outdated doc 2015-03-28 12:32:35 +00:00
sql-programming-guide.md [SPARK-7136][Docs] Spark SQL and DataFrame Guide fix example file and paths 2015-04-24 20:25:07 -07:00
storage-openstack-swift.md [SPARK-938][doc] Add OpenStack Swift support 2014-09-07 20:56:04 -07:00
streaming-custom-receivers.md [SPARK-4806] Streaming doc update for 1.2 2014-12-11 06:21:23 -08:00
streaming-flume-integration.md [SPARK-6128][Streaming][Documentation] Updates to Spark Streaming Programming Guide 2015-03-11 18:48:21 -07:00
streaming-kafka-integration.md [SPARK-6128][Streaming][Documentation] Updates to Spark Streaming Programming Guide 2015-03-11 18:48:21 -07:00
streaming-kinesis-integration.md SPARK-3069 [DOCS] Build instructions in README are outdated 2014-09-16 09:18:03 -07:00
streaming-programming-guide.md [doc][streaming] Fixed broken link in mllib section 2015-04-20 13:46:55 -07:00
submitting-applications.md [SPARK-6426][Doc]User could also point the yarn cluster config directory via YARN_CONF_DI... 2015-03-20 18:42:18 +00:00
tuning.md [SPARK-5112] Expose SizeEstimator as a developer api 2015-05-05 12:38:54 +01:00

Welcome to the Spark documentation!

This readme will walk you through navigating and building the Spark documentation, which is included here with the Spark source code. You can also find documentation specific to release versions of Spark at http://spark.apache.org/documentation.html.

Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the documentation yourself. Why build it yourself? So that you have the docs that corresponds to whichever version of Spark you currently have checked out of revision control.

Generating the Documentation HTML

We include the Spark documentation as part of the source (as opposed to using a hosted wiki, such as the github wiki, as the definitive documentation) to enable the documentation to evolve along with the source code and be captured by revision control (currently git). This way the code automatically includes the version of the documentation that is relevant regardless of which version or release you have checked out or downloaded.

In this directory you will find textfiles formatted using Markdown, with an ".md" suffix. You can read those text files directly if you want. Start with index.md.

The markdown code can be compiled to HTML using the Jekyll tool. Jekyll and a few dependencies must be installed for this to work. We recommend installing via the Ruby Gem dependency manager. Since the exact HTML output varies between versions of Jekyll and its dependencies, we list specific versions here in some cases:

$ sudo gem install jekyll
$ sudo gem install jekyll-redirect-from

Execute jekyll from the docs/ directory. Compiling the site with Jekyll will create a directory called _site containing index.html as well as the rest of the compiled files.

You can modify the default Jekyll build as follows:

# Skip generating API docs (which takes a while)
$ SKIP_API=1 jekyll build
# Serve content locally on port 4000
$ jekyll serve --watch
# Build the site with extra features used on the live page
$ PRODUCTION=1 jekyll build

Pygments

We also use pygments (http://pygments.org) for syntax highlighting in documentation markdown pages, so you will also need to install that (it requires Python) by running sudo pip install Pygments.

To mark a block of code in your markdown to be syntax highlighted by jekyll during the compile phase, use the following sytax:

{% highlight scala %}
// Your scala code goes here, you can replace scala with many other
// supported languages too.
{% endhighlight %}

Sphinx

We use Sphinx to generate Python API docs, so you will need to install it by running sudo pip install sphinx.

knitr, devtools

SparkR documentation is written using roxygen2 and we use knitr, devtools to generate documentation. To install these packages you can run install.packages(c("knitr", "devtools")) from a R console.

API Docs (Scaladoc, Sphinx, roxygen2)

You can build just the Spark scaladoc by running build/sbt unidoc from the SPARK_PROJECT_ROOT directory.

Similarly, you can build just the PySpark docs by running make html from the SPARK_PROJECT_ROOT/python/docs directory. Documentation is only generated for classes that are listed as public in __init__.py. The SparkR docs can be built by running SPARK_PROJECT_ROOT/R/create-docs.sh.

When you run jekyll in the docs directory, it will also copy over the scaladoc for the various Spark subprojects into the docs directory (and then also into the _site directory). We use a jekyll plugin to run build/sbt unidoc before building the site so if you haven't run it (recently) it may take some time as it generates all of the scaladoc. The jekyll plugin also generates the PySpark docs Sphinx.

NOTE: To skip the step of building and copying over the Scala, Python, R API docs, run SKIP_API=1 jekyll.