spark-instrumented-optimizer/dev/lint-python
Holden Karau a36a76ac43 [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed
## What changes were proposed in this pull request?

This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).

Done:
- pip installable on conda [manual tested]
- setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
- Automated testing of this (virtualenv)
- packaging and signing with release-build*

Possible follow up work:
- release-build update to publish to PyPI (SPARK-18128)
- figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
- Windows support and or testing ( SPARK-18136 )
- investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
- consider how we want to number our dev/snapshot versions

Explicitly out of scope:
- Using pip installed PySpark to start a standalone cluster
- Using pip installed PySpark for non-Python Spark programs

*I've done some work to test release-build locally but as a non-committer I've just done local testing.
## How was this patch tested?

Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.

release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)

Author: Holden Karau <holden@us.ibm.com>
Author: Juliet Hougland <juliet@cloudera.com>
Author: Juliet Hougland <not@myemail.com>

Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
2016-11-16 14:22:15 -08:00

110 lines
4 KiB
Bash
Executable file

#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsupport"
# TODO: fix pep8 errors with the rest of the Python scripts under dev
PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py ./dev/run-tests-jenkins.py"
PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py"
PEP8_REPORT_PATH="$SPARK_ROOT_DIR/dev/pep8-report.txt"
PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt"
PYLINT_INSTALL_INFO="$SPARK_ROOT_DIR/dev/pylint-info.txt"
SPHINXBUILD=${SPHINXBUILD:=sphinx-build}
SPHINX_REPORT_PATH="$SPARK_ROOT_DIR/dev/sphinx-report.txt"
cd "$SPARK_ROOT_DIR"
# compileall: https://docs.python.org/2/library/compileall.html
python -B -m compileall -q -l $PATHS_TO_CHECK > "$PEP8_REPORT_PATH"
compile_status="${PIPESTATUS[0]}"
# Get pep8 at runtime so that we don't rely on it being installed on the build server.
#+ See: https://github.com/apache/spark/pull/1744#issuecomment-50982162
#+ TODOs:
#+ - Download pep8 from PyPI. It's more "official".
PEP8_VERSION="1.7.0"
PEP8_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pep8-$PEP8_VERSION.py"
PEP8_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/jcrocholl/pep8/$PEP8_VERSION/pep8.py"
if [ ! -e "$PEP8_SCRIPT_PATH" ]; then
curl --silent -o "$PEP8_SCRIPT_PATH" "$PEP8_SCRIPT_REMOTE_PATH"
curl_status="$?"
if [ "$curl_status" -ne 0 ]; then
echo "Failed to download pep8.py from \"$PEP8_SCRIPT_REMOTE_PATH\"."
exit "$curl_status"
fi
fi
# Easy install pylint in /dev/pylint. To easy_install into a directory, the PYTHONPATH should
# be set to the directory.
# dev/pylint should be appended to the PATH variable as well.
# Jenkins by default installs the pylint3 version, so for now this just checks the code quality
# of python3.
export "PYTHONPATH=$SPARK_ROOT_DIR/dev/pylint"
export "PYLINT_HOME=$PYTHONPATH"
export "PATH=$PYTHONPATH:$PATH"
# There is no need to write this output to a file
#+ first, but we do so so that the check status can
#+ be output before the report, like with the
#+ scalastyle and RAT checks.
python "$PEP8_SCRIPT_PATH" --ignore=E402,E731,E241,W503,E226 --config=dev/tox.ini $PATHS_TO_CHECK >> "$PEP8_REPORT_PATH"
pep8_status="${PIPESTATUS[0]}"
if [ "$compile_status" -eq 0 -a "$pep8_status" -eq 0 ]; then
lint_status=0
else
lint_status=1
fi
if [ "$lint_status" -ne 0 ]; then
echo "PEP8 checks failed."
cat "$PEP8_REPORT_PATH"
rm "$PEP8_REPORT_PATH"
exit "$lint_status"
else
echo "PEP8 checks passed."
rm "$PEP8_REPORT_PATH"
fi
# Check that the documentation builds acceptably, skip check if sphinx is not installed.
if hash "$SPHINXBUILD" 2> /dev/null; then
cd python/docs
make clean
# Treat warnings as errors so we stop correctly
SPHINXOPTS="-a -W" make html &> "$SPHINX_REPORT_PATH" || lint_status=1
if [ "$lint_status" -ne 0 ]; then
echo "pydoc checks failed."
cat "$SPHINX_REPORT_PATH"
echo "re-running make html to print full warning list"
make clean
SPHINXOPTS="-a" make html
rm "$SPHINX_REPORT_PATH"
exit "$lint_status"
else
echo "pydoc checks passed."
rm "$SPHINX_REPORT_PATH"
fi
cd ../..
else
echo >&2 "The $SPHINXBUILD command was not found. Skipping pydoc checks for now"
fi