spark-instrumented-optimizer/docs/README.md
“attilapiros” bdcad33d8b [SPARK-34433][DOCS] Lock Jekyll version by Gemfile and Bundler
### What changes were proposed in this pull request?

Improving the documentation and release process by pinning Jekyll version by Gemfile and Bundler.

Some files and their responsibilities within this PR:
- `docs/.bundle/config` is used to specify a directory "docs/.local_ruby_bundle" which will be used as destination to install the ruby packages into instead of the global one which requires root access
- `docs/Gemfile` is specifying the required Jekyll version and other top level gem versions
- `docs/Gemfile.lock` is generated by the "bundle install". This file contains the exact resolved versions of all the gems including the top level gems and all the direct and transitive dependencies of those gems. When this file is generated it contains a platform related section "PLATFORMS" (in my case after the generation it was "universal-darwin-19"). Still this file must be under version control as when the version of a gem does not fit to the one specified in `Gemfile` an error comes (i.e. if the `Gemfile.lock` was generated for Jekyll 4.1.0 and its version is updated in the `Gemfile` to 4.2.0 then it triggers the error: "The bundle currently has jekyll locked at 4.1.0."). This is solution is also suggested officially in [its documentation](https://bundler.io/rationale.html#checking-your-code-into-version-control). To get rid of the specific platform (like "universal-darwin-19") first we have to add "ruby" as platform [which means this should work on every platform where Ruby runs](https://guides.rubygems.org/what-is-a-gem/)) by running "bundle lock --add-platform ruby" then the specific platform can be removed by "bundle lock --remove-platform universal-darwin-19".

After this the correct process to update Jekyll version is the following:
1. update the version in `Gemfile`
2. run "bundle update" which updates the `Gemfile.lock`
3. commit both files

This process for version update is tested for details please check the testing section.

### Why are the changes needed?

Using different Jekyll versions can generate different output documents.
This PR standardize the process.

### Does this PR introduce _any_ user-facing change?

No, assuming the release was done via docker by using `do-release-docker.sh`.
In that case  there should be no difference at all as the same Jekyll version is specified in the Gemfile.

### How was this patch tested?

#### Testing document generation

Doc generation step was triggered via  the docker release:

```
$ ./do-release-docker.sh -d ~/working -n -s docs
...
========================
= Building documentation...
Command: /opt/spark-rm/release-build.sh docs
Log file: docs.log
Skipping publish step.
```

The docs.log contains the followings:
```
Building Spark docs
Fetching gem metadata from https://rubygems.org/.........
Using bundler 2.2.9
Fetching rb-fsevent 0.10.4
Fetching forwardable-extended 2.6.0
Fetching public_suffix 4.0.6
Fetching colorator 1.1.0
Fetching eventmachine 1.2.7
Fetching http_parser.rb 0.6.0
Fetching ffi 1.14.2
Fetching concurrent-ruby 1.1.8
Installing colorator 1.1.0
Installing forwardable-extended 2.6.0
Installing rb-fsevent 0.10.4
Installing public_suffix 4.0.6
Installing http_parser.rb 0.6.0 with native extensions
Installing eventmachine 1.2.7 with native extensions
Installing concurrent-ruby 1.1.8
Fetching rexml 3.2.4
Fetching liquid 4.0.3
Installing ffi 1.14.2 with native extensions
Installing rexml 3.2.4
Installing liquid 4.0.3
Fetching mercenary 0.4.0
Installing mercenary 0.4.0
Fetching rouge 3.26.0
Installing rouge 3.26.0
Fetching safe_yaml 1.0.5
Installing safe_yaml 1.0.5
Fetching unicode-display_width 1.7.0
Installing unicode-display_width 1.7.0
Fetching webrick 1.7.0
Installing webrick 1.7.0
Fetching pathutil 0.16.2
Fetching kramdown 2.3.0
Fetching terminal-table 2.0.0
Fetching addressable 2.7.0
Fetching i18n 1.8.9
Installing terminal-table 2.0.0
Installing pathutil 0.16.2
Installing i18n 1.8.9
Installing addressable 2.7.0
Installing kramdown 2.3.0
Fetching kramdown-parser-gfm 1.1.0
Installing kramdown-parser-gfm 1.1.0
Fetching rb-inotify 0.10.1
Fetching sassc 2.4.0
Fetching em-websocket 0.5.2
Installing rb-inotify 0.10.1
Installing em-websocket 0.5.2
Installing sassc 2.4.0 with native extensions
Fetching listen 3.4.1
Installing listen 3.4.1
Fetching jekyll-watch 2.2.1
Installing jekyll-watch 2.2.1
Fetching jekyll-sass-converter 2.1.0
Installing jekyll-sass-converter 2.1.0
Fetching jekyll 4.2.0
Installing jekyll 4.2.0
Fetching jekyll-redirect-from 0.16.0
Installing jekyll-redirect-from 0.16.0
Bundle complete! 4 Gemfile dependencies, 30 gems now installed.
Bundled gems are installed into `./.local_ruby_bundle`
```

#### Testing Jekyll (or other gem) update

First locally I reverted Jekyll to 4.1.0:
```
$ rm Gemfile.lock
$ rm -rf .local_ruby_bundle

# edited Gemfile to use version 4.1.0
$ cat Gemfile
source "https://rubygems.org"

gem "jekyll", "4.1.0"
gem "rouge", "3.26.0"
gem "jekyll-redirect-from", "0.16.0"
gem "webrick", "1.7"
$ bundle install
...
```

Testing Jekyll version before the update:

```
$ bundle exec jekyll --version
jekyll 4.1.0
```

Imitating Jekyll update coming from git by reverting my local changes:

```
$ git checkout Gemfile
Updated 1 path from the index
$ cat Gemfile
source "https://rubygems.org"

gem "jekyll", "4.2.0"
gem "rouge", "3.26.0"
gem "jekyll-redirect-from", "0.16.0"
gem "webrick", "1.7"

$ git checkout Gemfile.lock
Updated 1 path from the index
```

Run the install:

```
$ bundle install
...
```

Checking the updated Jekyll version:
```
$ bundle exec jekyll --version
jekyll 4.2.0
```

Closes #31559 from attilapiros/pin-jekyll-version.

Lead-authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-02-18 12:17:57 +09:00

144 lines
6.4 KiB
Markdown

---
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
Welcome to the Spark documentation!
This readme will walk you through navigating and building the Spark documentation, which is included
here with the Spark source code. You can also find documentation specific to release versions of
Spark at https://spark.apache.org/documentation.html.
Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of Spark you currently have checked out of revision control.
## Prerequisites
The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
Python, R and SQL.
You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
installed. Make sure the `bundle` command is available, if not install the Gem containing it:
```sh
$ sudo gem install bundler
```
After this all the required ruby dependencies can be installed from the `docs/` directory via the Bundler:
```sh
$ cd docs
$ bundle install
```
Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may need to replace gem with gem2.0.
### R Documentation
If you'd like to generate R documentation, you'll need to [install Pandoc](https://pandoc.org/installing.html)
and install these libraries:
```sh
$ sudo Rscript -e 'install.packages(c("knitr", "devtools", "testthat", "rmarkdown"), repos="https://cloud.r-project.org/")'
$ sudo Rscript -e 'devtools::install_version("roxygen2", version = "7.1.1", repos="https://cloud.r-project.org/")'
```
Note: Other versions of roxygen2 might work in SparkR documentation generation but `RoxygenNote` field in `$SPARK_HOME/R/pkg/DESCRIPTION` is 7.1.1, which is updated if the version is mismatched.
### API Documentation
To generate API docs for any language, you'll need to install these libraries:
<!--
TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes.
See also https://github.com/sphinx-doc/sphinx/issues/7551.
-->
```sh
$ sudo pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsphinx numpydoc
```
## Generating the Documentation HTML
We include the Spark documentation as part of the source (as opposed to using a hosted wiki, such as
the github wiki, as the definitive documentation) to enable the documentation to evolve along with
the source code and be captured by revision control (currently git). This way the code automatically
includes the version of the documentation that is relevant regardless of which version or release
you have checked out or downloaded.
In this directory you will find text files formatted using Markdown, with an ".md" suffix. You can
read those text files directly if you want. Start with `index.md`.
Execute `bundle exec jekyll build` from the `docs/` directory to compile the site. Compiling the site with
Jekyll will create a directory called `_site` containing `index.html` as well as the rest of the
compiled files.
```sh
$ cd docs
$ bundle exec jekyll build
```
You can modify the default Jekyll build as follows:
```sh
# Skip generating API docs (which takes a while)
$ SKIP_API=1 bundle exec jekyll build
# Serve content locally on port 4000
$ bundle exec jekyll serve --watch
# Build the site with extra features used on the live page
$ PRODUCTION=1 bundle exec jekyll build
```
## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
You can build just the Spark scaladoc and javadoc by running `./build/sbt unidoc` from the `$SPARK_HOME` directory.
Similarly, you can build just the PySpark docs by running `make html` from the
`$SPARK_HOME/python/docs` directory. Documentation is only generated for classes that are listed as
public in `__init__.py`. The SparkR docs can be built by running `$SPARK_HOME/R/create-docs.sh`, and
the SQL docs can be built by running `$SPARK_HOME/sql/create-docs.sh`
after [building Spark](https://github.com/apache/spark#building-spark) first.
When you run `bundle exec jekyll build` in the `docs` directory, it will also copy over the scaladoc and javadoc for the various
Spark subprojects into the `docs` directory (and then also into the `_site` directory). We use a
jekyll plugin to run `./build/sbt unidoc` before building the site so if you haven't run it (recently) it
may take some time as it generates all of the scaladoc and javadoc using [Unidoc](https://github.com/sbt/sbt-unidoc).
The jekyll plugin also generates the PySpark docs using [Sphinx](http://sphinx-doc.org/), SparkR docs
using [roxygen2](https://cran.r-project.org/web/packages/roxygen2/index.html) and SQL docs
using [MkDocs](https://www.mkdocs.org/).
NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1
bundle exec jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used
to skip a single step of the corresponding language. `SKIP_SCALADOC` indicates skipping both the Scala and Java docs.
### Automatically Rebuilding API Docs
`bundle exec jekyll serve --watch` will only watch what's in `docs/`, and it won't follow symlinks. That means it won't monitor your API docs under `python/docs` or elsewhere.
To work around this limitation for Python, install [`entr`](http://eradman.com/entrproject/) and run the following in a separate shell:
```sh
cd "$SPARK_HOME/python/docs"
find .. -type f -name '*.py' \
| entr -s 'make html && cp -r _build/html/. ../../docs/api/python'
```
Whenever there is a change to your Python code, `entr` will automatically rebuild the Python API docs and copy them to `docs/`, thus triggering a Jekyll update.