[SPARK-34433][DOCS] Lock Jekyll version by Gemfile and Bundler

### What changes were proposed in this pull request?

Improving the documentation and release process by pinning Jekyll version by Gemfile and Bundler.

Some files and their responsibilities within this PR:
- `docs/.bundle/config` is used to specify a directory "docs/.local_ruby_bundle" which will be used as destination to install the ruby packages into instead of the global one which requires root access
- `docs/Gemfile` is specifying the required Jekyll version and other top level gem versions
- `docs/Gemfile.lock` is generated by the "bundle install". This file contains the exact resolved versions of all the gems including the top level gems and all the direct and transitive dependencies of those gems. When this file is generated it contains a platform related section "PLATFORMS" (in my case after the generation it was "universal-darwin-19"). Still this file must be under version control as when the version of a gem does not fit to the one specified in `Gemfile` an error comes (i.e. if the `Gemfile.lock` was generated for Jekyll 4.1.0 and its version is updated in the `Gemfile` to 4.2.0 then it triggers the error: "The bundle currently has jekyll locked at 4.1.0."). This is solution is also suggested officially in [its documentation](https://bundler.io/rationale.html#checking-your-code-into-version-control). To get rid of the specific platform (like "universal-darwin-19") first we have to add "ruby" as platform [which means this should work on every platform where Ruby runs](https://guides.rubygems.org/what-is-a-gem/)) by running "bundle lock --add-platform ruby" then the specific platform can be removed by "bundle lock --remove-platform universal-darwin-19".

After this the correct process to update Jekyll version is the following:
1. update the version in `Gemfile`
2. run "bundle update" which updates the `Gemfile.lock`
3. commit both files

This process for version update is tested for details please check the testing section.

### Why are the changes needed?

Using different Jekyll versions can generate different output documents.
This PR standardize the process.

### Does this PR introduce _any_ user-facing change?

No, assuming the release was done via docker by using `do-release-docker.sh`.
In that case  there should be no difference at all as the same Jekyll version is specified in the Gemfile.

### How was this patch tested?

#### Testing document generation

Doc generation step was triggered via  the docker release:

```
$ ./do-release-docker.sh -d ~/working -n -s docs
...
========================
= Building documentation...
Command: /opt/spark-rm/release-build.sh docs
Log file: docs.log
Skipping publish step.
```

The docs.log contains the followings:
```
Building Spark docs
Fetching gem metadata from https://rubygems.org/.........
Using bundler 2.2.9
Fetching rb-fsevent 0.10.4
Fetching forwardable-extended 2.6.0
Fetching public_suffix 4.0.6
Fetching colorator 1.1.0
Fetching eventmachine 1.2.7
Fetching http_parser.rb 0.6.0
Fetching ffi 1.14.2
Fetching concurrent-ruby 1.1.8
Installing colorator 1.1.0
Installing forwardable-extended 2.6.0
Installing rb-fsevent 0.10.4
Installing public_suffix 4.0.6
Installing http_parser.rb 0.6.0 with native extensions
Installing eventmachine 1.2.7 with native extensions
Installing concurrent-ruby 1.1.8
Fetching rexml 3.2.4
Fetching liquid 4.0.3
Installing ffi 1.14.2 with native extensions
Installing rexml 3.2.4
Installing liquid 4.0.3
Fetching mercenary 0.4.0
Installing mercenary 0.4.0
Fetching rouge 3.26.0
Installing rouge 3.26.0
Fetching safe_yaml 1.0.5
Installing safe_yaml 1.0.5
Fetching unicode-display_width 1.7.0
Installing unicode-display_width 1.7.0
Fetching webrick 1.7.0
Installing webrick 1.7.0
Fetching pathutil 0.16.2
Fetching kramdown 2.3.0
Fetching terminal-table 2.0.0
Fetching addressable 2.7.0
Fetching i18n 1.8.9
Installing terminal-table 2.0.0
Installing pathutil 0.16.2
Installing i18n 1.8.9
Installing addressable 2.7.0
Installing kramdown 2.3.0
Fetching kramdown-parser-gfm 1.1.0
Installing kramdown-parser-gfm 1.1.0
Fetching rb-inotify 0.10.1
Fetching sassc 2.4.0
Fetching em-websocket 0.5.2
Installing rb-inotify 0.10.1
Installing em-websocket 0.5.2
Installing sassc 2.4.0 with native extensions
Fetching listen 3.4.1
Installing listen 3.4.1
Fetching jekyll-watch 2.2.1
Installing jekyll-watch 2.2.1
Fetching jekyll-sass-converter 2.1.0
Installing jekyll-sass-converter 2.1.0
Fetching jekyll 4.2.0
Installing jekyll 4.2.0
Fetching jekyll-redirect-from 0.16.0
Installing jekyll-redirect-from 0.16.0
Bundle complete! 4 Gemfile dependencies, 30 gems now installed.
Bundled gems are installed into `./.local_ruby_bundle`
```

#### Testing Jekyll (or other gem) update

First locally I reverted Jekyll to 4.1.0:
```
$ rm Gemfile.lock
$ rm -rf .local_ruby_bundle

# edited Gemfile to use version 4.1.0
$ cat Gemfile
source "https://rubygems.org"

gem "jekyll", "4.1.0"
gem "rouge", "3.26.0"
gem "jekyll-redirect-from", "0.16.0"
gem "webrick", "1.7"
$ bundle install
...
```

Testing Jekyll version before the update:

```
$ bundle exec jekyll --version
jekyll 4.1.0
```

Imitating Jekyll update coming from git by reverting my local changes:

```
$ git checkout Gemfile
Updated 1 path from the index
$ cat Gemfile
source "https://rubygems.org"

gem "jekyll", "4.2.0"
gem "rouge", "3.26.0"
gem "jekyll-redirect-from", "0.16.0"
gem "webrick", "1.7"

$ git checkout Gemfile.lock
Updated 1 path from the index
```

Run the install:

```
$ bundle install
...
```

Checking the updated Jekyll version:
```
$ bundle exec jekyll --version
jekyll 4.2.0
```

Closes #31559 from attilapiros/pin-jekyll-version.

Lead-authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
“attilapiros” 2021-02-18 12:17:57 +09:00 committed by HyukjinKwon
parent 1ad343238c
commit bdcad33d8b
12 changed files with 140 additions and 25 deletions

View file

@ -342,8 +342,10 @@ jobs:
python3.6 -m pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsphinx numpydoc
apt-get update -y
apt-get install -y ruby ruby-dev
gem install jekyll jekyll-redirect-from rouge
Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"
gem install bundler
cd docs
bundle install
- name: Scala linter
run: ./dev/lint-scala
- name: Java linter
@ -361,7 +363,7 @@ jobs:
cd docs
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
jekyll build
bundle exec jekyll build
java-11:
name: Java 11 build with Maven

1
.gitignore vendored
View file

@ -48,6 +48,7 @@ dev/pr-deps/
dist/
docs/_site/
docs/api
docs/.local_ruby_bundle
sql/docs
sql/site
lib_managed/

View file

@ -91,6 +91,11 @@ for f in "$SELF"/*; do
fi
done
# Add the fallback version of Gemfile, Gemfile.lock and .bundle/config to the local directory.
cp "$SELF/../../docs/Gemfile" "$WORKDIR"
cp "$SELF/../../docs/Gemfile.lock" "$WORKDIR"
cp -r "$SELF/../../docs/.bundle" "$WORKDIR"
GPG_KEY_FILE="$WORKDIR/gpg.key"
fcreate_secure "$GPG_KEY_FILE"
$GPG --export-secret-key --armor --pinentry-mode loopback --passphrase "$GPG_PASSPHRASE" "$GPG_KEY" > "$GPG_KEY_FILE"

View file

@ -333,7 +333,13 @@ if [[ "$1" == "docs" ]]; then
echo "Building Spark docs"
cd docs
# TODO: Make configurable to add this: PRODUCTION=1
PRODUCTION=1 RELEASE_VERSION="$SPARK_VERSION" jekyll build
if [ ! -f "Gemfile" ]; then
cp "$SELF/Gemfile" .
cp "$SELF/Gemfile.lock" .
cp -r "$SELF/.bundle" .
fi
bundle install
PRODUCTION=1 RELEASE_VERSION="$SPARK_VERSION" bundle exec jekyll build
cd ..
cd ..

View file

@ -41,7 +41,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y"
# See also https://github.com/sphinx-doc/sphinx/issues/7551.
# We should use the latest Sphinx version once this is fixed.
ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.19.4 pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0"
ARG GEM_PKGS="jekyll:4.2.0 jekyll-redirect-from:0.16.0 rouge:3.26.0"
ARG GEM_PKGS="bundler:2.2.9"
# Install extra needed repos and refresh.
# - CRAN repo

View file

@ -274,7 +274,8 @@ SPARK_ROOT_DIR="$(dirname "${SCRIPT_DIR}")"
pushd "$SPARK_ROOT_DIR" &> /dev/null
PYTHON_SOURCE="$(find . -name "*.py")"
# skipping local ruby bundle directory from the search
PYTHON_SOURCE="$(find . -path ./docs/.local_ruby_bundle -prune -false -o -name "*.py")"
compile_python_test "$PYTHON_SOURCE"
pycodestyle_test "$PYTHON_SOURCE"

View file

@ -240,18 +240,19 @@ def run_sparkr_style_checks():
def build_spark_documentation():
set_title_and_block("Building Spark Documentation", "BLOCK_DOCUMENTATION")
os.environ["PRODUCTION"] = "1 jekyll build"
os.environ["PRODUCTION"] = "1"
os.chdir(os.path.join(SPARK_HOME, "docs"))
jekyll_bin = which("jekyll")
bundle_bin = which("bundle")
if not jekyll_bin:
print("[error] Cannot find a version of `jekyll` on the system; please",
" install one and retry to build documentation.")
if not bundle_bin:
print("[error] Cannot find a version of `bundle` on the system; please",
" install one with `gem install bundler` and retry to build documentation.")
sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
else:
run_cmd([jekyll_bin, "build"])
run_cmd([bundle_bin, "install"])
run_cmd([bundle_bin, "exec", "jekyll", "build"])
os.chdir(SPARK_HOME)
@ -754,7 +755,7 @@ def main():
run_sparkr_style_checks()
# determine if docs were changed and if we're inside the amplab environment
# note - the below commented out until *all* Jenkins workers can get `jekyll` installed
# note - the below commented out until *all* Jenkins workers can get the Bundler gem installed
# if "DOCS" in changed_modules and test_env == "amplab_jenkins":
# build_spark_documentation()

2
docs/.bundle/config Normal file
View file

@ -0,0 +1,2 @@
---
BUNDLE_PATH: ".local_ruby_bundle"

23
docs/Gemfile Normal file
View file

@ -0,0 +1,23 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
source "https://rubygems.org"
gem "jekyll", "4.2.0"
gem "rouge", "3.26.0"
gem "jekyll-redirect-from", "0.16.0"
gem "webrick", "1.7"

73
docs/Gemfile.lock Normal file
View file

@ -0,0 +1,73 @@
GEM
remote: https://rubygems.org/
specs:
addressable (2.7.0)
public_suffix (>= 2.0.2, < 5.0)
colorator (1.1.0)
concurrent-ruby (1.1.8)
em-websocket (0.5.2)
eventmachine (>= 0.12.9)
http_parser.rb (~> 0.6.0)
eventmachine (1.2.7)
ffi (1.14.2)
forwardable-extended (2.6.0)
http_parser.rb (0.6.0)
i18n (1.8.9)
concurrent-ruby (~> 1.0)
jekyll (4.2.0)
addressable (~> 2.4)
colorator (~> 1.0)
em-websocket (~> 0.5)
i18n (~> 1.0)
jekyll-sass-converter (~> 2.0)
jekyll-watch (~> 2.0)
kramdown (~> 2.3)
kramdown-parser-gfm (~> 1.0)
liquid (~> 4.0)
mercenary (~> 0.4.0)
pathutil (~> 0.9)
rouge (~> 3.0)
safe_yaml (~> 1.0)
terminal-table (~> 2.0)
jekyll-redirect-from (0.16.0)
jekyll (>= 3.3, < 5.0)
jekyll-sass-converter (2.1.0)
sassc (> 2.0.1, < 3.0)
jekyll-watch (2.2.1)
listen (~> 3.0)
kramdown (2.3.0)
rexml
kramdown-parser-gfm (1.1.0)
kramdown (~> 2.0)
liquid (4.0.3)
listen (3.4.1)
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
mercenary (0.4.0)
pathutil (0.16.2)
forwardable-extended (~> 2.6)
public_suffix (4.0.6)
rb-fsevent (0.10.4)
rb-inotify (0.10.1)
ffi (~> 1.0)
rexml (3.2.4)
rouge (3.26.0)
safe_yaml (1.0.5)
sassc (2.4.0)
ffi (~> 1.9)
terminal-table (2.0.0)
unicode-display_width (~> 1.1, >= 1.1.1)
unicode-display_width (1.7.0)
webrick (1.7.0)
PLATFORMS
ruby
DEPENDENCIES
jekyll (= 4.2.0)
jekyll-redirect-from (= 0.16.0)
rouge (= 3.26.0)
webrick (= 1.7)
BUNDLED WITH
2.2.9

View file

@ -33,16 +33,17 @@ Python, R and SQL.
You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
[Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
installed. Also install the following libraries:
installed. Make sure the `bundle` command is available, if not install the Gem containing it:
```sh
$ sudo gem install jekyll jekyll-redirect-from rouge
$ sudo gem install bundler
```
If your ruby version is 3.0 or higher, you should also install `webrick`.
After this all the required ruby dependencies can be installed from the `docs/` directory via the Bundler:
```sh
$ sudo gem install jekyll jekyll-redirect-from webrick
$ cd docs
$ bundle install
```
Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may need to replace gem with gem2.0.
@ -83,26 +84,26 @@ you have checked out or downloaded.
In this directory you will find text files formatted using Markdown, with an ".md" suffix. You can
read those text files directly if you want. Start with `index.md`.
Execute `jekyll build` from the `docs/` directory to compile the site. Compiling the site with
Execute `bundle exec jekyll build` from the `docs/` directory to compile the site. Compiling the site with
Jekyll will create a directory called `_site` containing `index.html` as well as the rest of the
compiled files.
```sh
$ cd docs
$ jekyll build
$ bundle exec jekyll build
```
You can modify the default Jekyll build as follows:
```sh
# Skip generating API docs (which takes a while)
$ SKIP_API=1 jekyll build
$ SKIP_API=1 bundle exec jekyll build
# Serve content locally on port 4000
$ jekyll serve --watch
$ bundle exec jekyll serve --watch
# Build the site with extra features used on the live page
$ PRODUCTION=1 jekyll build
$ PRODUCTION=1 bundle exec jekyll build
```
## API Docs (Scaladoc, Javadoc, Sphinx, roxygen2, MkDocs)
@ -115,7 +116,7 @@ public in `__init__.py`. The SparkR docs can be built by running `$SPARK_HOME/R/
the SQL docs can be built by running `$SPARK_HOME/sql/create-docs.sh`
after [building Spark](https://github.com/apache/spark#building-spark) first.
When you run `jekyll build` in the `docs` directory, it will also copy over the scaladoc and javadoc for the various
When you run `bundle exec jekyll build` in the `docs` directory, it will also copy over the scaladoc and javadoc for the various
Spark subprojects into the `docs` directory (and then also into the `_site` directory). We use a
jekyll plugin to run `./build/sbt unidoc` before building the site so if you haven't run it (recently) it
may take some time as it generates all of the scaladoc and javadoc using [Unidoc](https://github.com/sbt/sbt-unidoc).
@ -124,12 +125,12 @@ using [roxygen2](https://cran.r-project.org/web/packages/roxygen2/index.html) an
using [MkDocs](https://www.mkdocs.org/).
NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1
jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used
bundle exec jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used
to skip a single step of the corresponding language. `SKIP_SCALADOC` indicates skipping both the Scala and Java docs.
### Automatically Rebuilding API Docs
`jekyll serve --watch` will only watch what's in `docs/`, and it won't follow symlinks. That means it won't monitor your API docs under `python/docs` or elsewhere.
`bundle exec jekyll serve --watch` will only watch what's in `docs/`, and it won't follow symlinks. That means it won't monitor your API docs under `python/docs` or elsewhere.
To work around this limitation for Python, install [`entr`](http://eradman.com/entrproject/) and run the following in a separate shell:

View file

@ -53,7 +53,7 @@ under the `docs <https://github.com/apache/spark/tree/master/docs>`_ directory:
.. code-block:: bash
SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch
SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 bundle exec jekyll serve --watch
PySpark uses Sphinx to generate its release PySpark documentation. Therefore, if you want to build only PySpark documentation alone,
you can build under `python/docs <https://github.com/apache/spark/tree/master/python>`_ directory by: