6ab29b37cf
### What changes were proposed in this pull request? This PR proposes to redesign the PySpark documentation. I made a demo site to make it easier to review: https://hyukjin-spark.readthedocs.io/en/stable/reference/index.html. Here is the initial draft for the final PySpark docs shape: https://hyukjin-spark.readthedocs.io/en/latest/index.html. In more details, this PR proposes: 1. Use [pydata_sphinx_theme](https://github.com/pandas-dev/pydata-sphinx-theme) theme - [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/) use this theme. The CSS overwrite is ported from Koalas. The colours in the CSS were actually chosen by designers to use in Spark. 2. Use the Sphinx option to separate `source` and `build` directories as the documentation pages will likely grow. 3. Port current API documentation into the new style. It mimics Koalas and pandas to use the theme most effectively. One disadvantage of this approach is that you should list up APIs or classes; however, I think this isn't a big issue in PySpark since we're being conservative on adding APIs. I also intentionally listed classes only instead of functions in ML and MLlib to make it relatively easier to manage. ### Why are the changes needed? Often I hear the complaints, from the users, that current PySpark documentation is pretty messy to read - https://spark.apache.org/docs/latest/api/python/index.html compared other projects such as [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/). It would be nicer if we can make it more organised instead of just listing all classes, methods and attributes to make it easier to navigate. Also, the documentation has been there from almost the very first version of PySpark. Maybe it's time to update it. ### Does this PR introduce _any_ user-facing change? Yes, PySpark API documentation will be redesigned. ### How was this patch tested? Manually tested, and the demo site was made to show. Closes #29188 from HyukjinKwon/SPARK-32179. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
240 lines
8 KiB
Bash
Executable file
240 lines
8 KiB
Bash
Executable file
#!/usr/bin/env bash
|
|
#
|
|
# Licensed to the Apache Software Foundation (ASF) under one or more
|
|
# contributor license agreements. See the NOTICE file distributed with
|
|
# this work for additional information regarding copyright ownership.
|
|
# The ASF licenses this file to You under the Apache License, Version 2.0
|
|
# (the "License"); you may not use this file except in compliance with
|
|
# the License. You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
#
|
|
# define test binaries + versions
|
|
FLAKE8_BUILD="flake8"
|
|
MINIMUM_FLAKE8="3.5.0"
|
|
|
|
PYCODESTYLE_BUILD="pycodestyle"
|
|
MINIMUM_PYCODESTYLE="2.6.0"
|
|
|
|
SPHINX_BUILD="sphinx-build"
|
|
|
|
PYTHON_EXECUTABLE="python3"
|
|
|
|
function satisfies_min_version {
|
|
local provided_version="$1"
|
|
local expected_version="$2"
|
|
echo "$(
|
|
"$PYTHON_EXECUTABLE" << EOM
|
|
from setuptools.extern.packaging import version
|
|
print(version.parse('$provided_version') >= version.parse('$expected_version'))
|
|
EOM
|
|
)"
|
|
}
|
|
|
|
function compile_python_test {
|
|
local COMPILE_STATUS=
|
|
local COMPILE_REPORT=
|
|
|
|
if [[ ! "$1" ]]; then
|
|
echo "No python files found! Something is very wrong -- exiting."
|
|
exit 1;
|
|
fi
|
|
|
|
# compileall: https://docs.python.org/3/library/compileall.html
|
|
echo "starting python compilation test..."
|
|
COMPILE_REPORT=$( ("$PYTHON_EXECUTABLE" -B -mcompileall -q -l -x "[/\\\\][.]git" $1) 2>&1)
|
|
COMPILE_STATUS=$?
|
|
|
|
if [ $COMPILE_STATUS -ne 0 ]; then
|
|
echo "Python compilation failed with the following errors:"
|
|
echo "$COMPILE_REPORT"
|
|
echo "$COMPILE_STATUS"
|
|
exit "$COMPILE_STATUS"
|
|
else
|
|
echo "python compilation succeeded."
|
|
echo
|
|
fi
|
|
}
|
|
|
|
function pycodestyle_test {
|
|
local PYCODESTYLE_STATUS=
|
|
local PYCODESTYLE_REPORT=
|
|
local RUN_LOCAL_PYCODESTYLE=
|
|
local PYCODESTYLE_VERSION=
|
|
local EXPECTED_PYCODESTYLE=
|
|
local PYCODESTYLE_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-$MINIMUM_PYCODESTYLE.py"
|
|
local PYCODESTYLE_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/PyCQA/pycodestyle/$MINIMUM_PYCODESTYLE/pycodestyle.py"
|
|
|
|
if [[ ! "$1" ]]; then
|
|
echo "No python files found! Something is very wrong -- exiting."
|
|
exit 1;
|
|
fi
|
|
|
|
# check for locally installed pycodestyle & version
|
|
RUN_LOCAL_PYCODESTYLE="False"
|
|
if hash "$PYCODESTYLE_BUILD" 2> /dev/null; then
|
|
PYCODESTYLE_VERSION="$($PYCODESTYLE_BUILD --version)"
|
|
EXPECTED_PYCODESTYLE="$(satisfies_min_version $PYCODESTYLE_VERSION $MINIMUM_PYCODESTYLE)"
|
|
if [ "$EXPECTED_PYCODESTYLE" == "True" ]; then
|
|
RUN_LOCAL_PYCODESTYLE="True"
|
|
fi
|
|
fi
|
|
|
|
# download the right version or run locally
|
|
if [ $RUN_LOCAL_PYCODESTYLE == "False" ]; then
|
|
# Get pycodestyle at runtime so that we don't rely on it being installed on the build server.
|
|
# See: https://github.com/apache/spark/pull/1744#issuecomment-50982162
|
|
# Updated to the latest official version of pep8. pep8 is formally renamed to pycodestyle.
|
|
echo "downloading pycodestyle from $PYCODESTYLE_SCRIPT_REMOTE_PATH..."
|
|
if [ ! -e "$PYCODESTYLE_SCRIPT_PATH" ]; then
|
|
curl --silent -o "$PYCODESTYLE_SCRIPT_PATH" "$PYCODESTYLE_SCRIPT_REMOTE_PATH"
|
|
local curl_status="$?"
|
|
|
|
if [ "$curl_status" -ne 0 ]; then
|
|
echo "Failed to download pycodestyle.py from $PYCODESTYLE_SCRIPT_REMOTE_PATH"
|
|
exit "$curl_status"
|
|
fi
|
|
fi
|
|
|
|
echo "starting pycodestyle test..."
|
|
PYCODESTYLE_REPORT=$( ("$PYTHON_EXECUTABLE" "$PYCODESTYLE_SCRIPT_PATH" --config=dev/tox.ini $1) 2>&1)
|
|
PYCODESTYLE_STATUS=$?
|
|
else
|
|
# we have the right version installed, so run locally
|
|
echo "starting pycodestyle test..."
|
|
PYCODESTYLE_REPORT=$( ($PYCODESTYLE_BUILD --config=dev/tox.ini $1) 2>&1)
|
|
PYCODESTYLE_STATUS=$?
|
|
fi
|
|
|
|
if [ $PYCODESTYLE_STATUS -ne 0 ]; then
|
|
echo "pycodestyle checks failed:"
|
|
echo "$PYCODESTYLE_REPORT"
|
|
exit "$PYCODESTYLE_STATUS"
|
|
else
|
|
echo "pycodestyle checks passed."
|
|
echo
|
|
fi
|
|
}
|
|
|
|
function flake8_test {
|
|
local FLAKE8_VERSION=
|
|
local EXPECTED_FLAKE8=
|
|
local FLAKE8_REPORT=
|
|
local FLAKE8_STATUS=
|
|
|
|
if ! hash "$FLAKE8_BUILD" 2> /dev/null; then
|
|
echo "The flake8 command was not found."
|
|
echo "flake8 checks failed."
|
|
exit 1
|
|
fi
|
|
|
|
_FLAKE8_VERSION=($($FLAKE8_BUILD --version))
|
|
FLAKE8_VERSION="${_FLAKE8_VERSION[0]}"
|
|
EXPECTED_FLAKE8="$(satisfies_min_version $FLAKE8_VERSION $MINIMUM_FLAKE8)"
|
|
|
|
if [[ "$EXPECTED_FLAKE8" == "False" ]]; then
|
|
echo "\
|
|
The minimum flake8 version needs to be $MINIMUM_FLAKE8. Your current version is $FLAKE8_VERSION
|
|
|
|
flake8 checks failed."
|
|
exit 1
|
|
fi
|
|
|
|
echo "starting $FLAKE8_BUILD test..."
|
|
FLAKE8_REPORT=$( ($FLAKE8_BUILD . --count --select=E901,E999,F821,F822,F823 \
|
|
--max-line-length=100 --show-source --statistics) 2>&1)
|
|
FLAKE8_STATUS=$?
|
|
|
|
if [ "$FLAKE8_STATUS" -ne 0 ]; then
|
|
echo "flake8 checks failed:"
|
|
echo "$FLAKE8_REPORT"
|
|
echo "$FLAKE8_STATUS"
|
|
exit "$FLAKE8_STATUS"
|
|
else
|
|
echo "flake8 checks passed."
|
|
echo
|
|
fi
|
|
}
|
|
|
|
function sphinx_test {
|
|
local SPHINX_REPORT=
|
|
local SPHINX_STATUS=
|
|
|
|
# Check that the documentation builds acceptably, skip check if sphinx is not installed.
|
|
if ! hash "$SPHINX_BUILD" 2> /dev/null; then
|
|
echo "The $SPHINX_BUILD command was not found. Skipping Sphinx build for now."
|
|
echo
|
|
return
|
|
fi
|
|
|
|
PYTHON_HAS_SPHINX=$("$PYTHON_EXECUTABLE" -c 'import importlib.util; print(importlib.util.find_spec("sphinx") is not None)')
|
|
if [[ "$PYTHON_HAS_SPHINX" == "False" ]]; then
|
|
echo "$PYTHON_EXECUTABLE does not have Sphinx installed. Skipping Sphinx build for now."
|
|
echo
|
|
return
|
|
fi
|
|
|
|
# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes.
|
|
# See also https://github.com/sphinx-doc/sphinx/issues/7551.
|
|
PYTHON_HAS_SPHINX_3_0=$("$PYTHON_EXECUTABLE" -c 'from distutils.version import LooseVersion; import sphinx; print(LooseVersion(sphinx.__version__) < LooseVersion("3.1.0"))')
|
|
if [[ "$PYTHON_HAS_SPHINX_3_0" == "False" ]]; then
|
|
echo "$PYTHON_EXECUTABLE has Sphinx 3.1+ installed but it requires lower then 3.1. Skipping Sphinx build for now."
|
|
echo
|
|
return
|
|
fi
|
|
|
|
# TODO(SPARK-32391): Install pydata_sphinx_theme in Jenkins machines
|
|
PYTHON_HAS_THEME=$("$PYTHON_EXECUTABLE" -c 'import importlib.util; print(importlib.util.find_spec("pydata_sphinx_theme") is not None)')
|
|
if [[ "$PYTHON_HAS_THEME" == "False" ]]; then
|
|
echo "$PYTHON_EXECUTABLE does not have pydata_sphinx_theme installed. Skipping Sphinx build for now."
|
|
echo
|
|
return
|
|
fi
|
|
|
|
echo "starting $SPHINX_BUILD tests..."
|
|
pushd python/docs &> /dev/null
|
|
make clean &> /dev/null
|
|
# Treat warnings as errors so we stop correctly
|
|
SPHINX_REPORT=$( (SPHINXOPTS="-a -W" make html) 2>&1)
|
|
SPHINX_STATUS=$?
|
|
|
|
if [ "$SPHINX_STATUS" -ne 0 ]; then
|
|
echo "$SPHINX_BUILD checks failed:"
|
|
echo "$SPHINX_REPORT"
|
|
echo
|
|
echo "re-running make html to print full warning list:"
|
|
make clean &> /dev/null
|
|
SPHINX_REPORT=$( (SPHINXOPTS="-a" make html) 2>&1)
|
|
echo "$SPHINX_REPORT"
|
|
exit "$SPHINX_STATUS"
|
|
else
|
|
echo "$SPHINX_BUILD checks passed."
|
|
echo
|
|
fi
|
|
|
|
popd &> /dev/null
|
|
}
|
|
|
|
SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
|
|
SPARK_ROOT_DIR="$(dirname "${SCRIPT_DIR}")"
|
|
|
|
pushd "$SPARK_ROOT_DIR" &> /dev/null
|
|
|
|
PYTHON_SOURCE="$(find . -name "*.py")"
|
|
|
|
compile_python_test "$PYTHON_SOURCE"
|
|
pycodestyle_test "$PYTHON_SOURCE"
|
|
flake8_test
|
|
sphinx_test
|
|
|
|
echo
|
|
echo "all lint-python tests passed!"
|
|
|
|
popd &> /dev/null
|