spark-instrumented-optimizer/python/run-tests-with-coverage
Hyukjin Kwon c0d1860f25 [SPARK-36092][INFRA][BUILD][PYTHON] Migrate to GitHub Actions with Codecov from Jenkins
### What changes were proposed in this pull request?

This PR proposes to migrate Coverage report from Jenkins to GitHub Actions by setting a dailly cron job.

### Why are the changes needed?

For some background, currently PySpark code coverage is being reported in this specific Jenkins job: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/

Because of the security issue between [Codecov service](https://app.codecov.io/gh/) and Jenkins machines, we had to work around by manually hosting a coverage site via GitHub pages, see also https://spark-test.github.io/pyspark-coverage-site/ by spark-test account (which is shared to only subset of PMC members).

Since we now run the build via GitHub Actions, we can leverage [Codecov plugin](https://github.com/codecov/codecov-action), and remove the workaround we used.

### Does this PR introduce _any_ user-facing change?

Virtually no. Coverage site (UI) might change but the information it holds should be virtually the same.

### How was this patch tested?

I manually tested:
- Scheduled run: https://github.com/HyukjinKwon/spark/actions/runs/1082261484
- Coverage report: 73f0291a7d/python/pyspark
- Run against a PR: https://github.com/HyukjinKwon/spark/actions/runs/1082367175

Closes #33591 from HyukjinKwon/SPARK-36092.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-08-01 21:37:19 +09:00

70 lines
2.6 KiB
Bash
Executable file

#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -o pipefail
set -e
# This variable indicates which coverage executable to run to combine coverages
# and generate HTMLs, for example, 'coverage3' in Python 3.
COV_EXEC="${COV_EXEC:-coverage}"
FWDIR="$(cd "`dirname $0`"; pwd)"
pushd "$FWDIR" > /dev/null
# Ensure that coverage executable is installed.
if ! hash $COV_EXEC 2>/dev/null; then
echo "Missing coverage executable in your path, skipping PySpark coverage"
exit 1
fi
# Set up the directories for coverage results.
export COVERAGE_DIR="$FWDIR/test_coverage"
rm -fr "$COVERAGE_DIR/coverage_data"
rm -fr "$COVERAGE_DIR/htmlcov"
mkdir -p "$COVERAGE_DIR/coverage_data"
# Current directory are added in the python path so that it doesn't refer our built
# pyspark zip library first.
export PYTHONPATH="$FWDIR:$PYTHONPATH"
# Also, our sitecustomize.py and coverage_daemon.py are included in the path.
export PYTHONPATH="$COVERAGE_DIR:$PYTHONPATH"
# We use 'spark.python.daemon.module' configuration to insert the coverage supported workers.
export SPARK_CONF_DIR="$COVERAGE_DIR/conf"
# This environment variable enables the coverage.
export COVERAGE_PROCESS_START="$FWDIR/.coveragerc"
./run-tests "$@"
# Don't run coverage for the coverage command itself
unset COVERAGE_PROCESS_START
# Coverage could generate empty coverage data files. Remove it to get rid of warnings when combining.
find $COVERAGE_DIR/coverage_data -size 0 -print0 | xargs -0 rm -fr
echo "Combining collected coverage data under $COVERAGE_DIR/coverage_data"
$COV_EXEC combine
echo "Creating XML report file at python/coverage.xml"
$COV_EXEC xml --ignore-errors --include "pyspark/*"
echo "Reporting the coverage data at $COVERAGE_DIR/coverage_data/coverage"
$COV_EXEC report --include "pyspark/*"
echo "Generating HTML files for PySpark coverage under $COVERAGE_DIR/htmlcov"
$COV_EXEC html --ignore-errors --include "pyspark/*" --directory "$COVERAGE_DIR/htmlcov"
popd