spark-instrumented-optimizer/.github/workflows/benchmark.yml
Yikun Jiang f3dc549d9c [SPARK-35668][INFRA] Use "concurrency" syntax on Github Actions workflow
### What changes were proposed in this pull request?

This patch uses the "concurrency" syntax to replace the "cancel job" workflow:
- .github/workflows/benchmark.yml
- .github/workflows/labeler.yml
- .github/workflows/notify_test_workflow.yml
- .github/workflows/test_report.yml

Remove the .github/workflows/cancel_duplicate_workflow_runs.yml

Note that the push/schedule based job are not changed to keep the same config in a4b70758d3:
- .github/workflows/build_and_test.yml
- .github/workflows/publish_snapshot.yml
- .github/workflows/stale.yml
- .github/workflows/update_build_status.yml

### Why are the changes needed?
We are using [cancel_duplicate_workflow_runs](a70e66ecfa/.github/workflows/cancel_duplicate_workflow_runs.yml (L1)) job to cancel previous jobs when a new job is queued. Now, it has been supported by the github action by using ["concurrency"](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#concurrency) syntax to make sure only a single job or workflow using the same concurrency group.

Related: https://github.com/apache/arrow/pull/10416 and https://github.com/potiuk/cancel-workflow-runs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
triger the PR manaully

Closes #32806 from Yikun/SPARK-X.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-06-08 12:10:40 +09:00

105 lines
4 KiB
YAML

name: Run benchmarks
on:
workflow_dispatch:
inputs:
class:
description: 'Benchmark class'
required: true
default: '*'
jdk:
description: 'JDK version: 8 or 11'
required: true
default: '8'
failfast:
description: 'Failfast: true or false'
required: true
default: 'true'
num-splits:
description: 'Number of job splits'
required: true
default: '1'
concurrency:
group: ${{ github.repository }}-${{ github.ref }}-${{ github.workflow }}
cancel-in-progress: true
jobs:
matrix-gen:
name: Generate matrix for job splits
runs-on: ubuntu-20.04
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
env:
SPARK_BENCHMARK_NUM_SPLITS: ${{ github.event.inputs.num-splits }}
steps:
- name: Generate matrix
id: set-matrix
run: echo "::set-output name=matrix::["`seq -s, 1 $SPARK_BENCHMARK_NUM_SPLITS`"]"
benchmark:
name: "Run benchmarks: ${{ github.event.inputs.class }} (JDK ${{ github.event.inputs.jdk }}, ${{ matrix.split }} out of ${{ github.event.inputs.num-splits }} splits)"
needs: matrix-gen
# Ubuntu 20.04 is the latest LTS. The next LTS is 22.04.
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
split: ${{fromJSON(needs.matrix-gen.outputs.matrix)}}
env:
SPARK_BENCHMARK_FAILFAST: ${{ github.event.inputs.failfast }}
SPARK_BENCHMARK_NUM_SPLITS: ${{ github.event.inputs.num-splits }}
SPARK_BENCHMARK_CUR_SPLIT: ${{ matrix.split }}
SPARK_GENERATE_BENCHMARK_FILES: 1
SPARK_LOCAL_IP: localhost
steps:
- name: Checkout Spark repository
uses: actions/checkout@v2
# In order to get diff files
with:
fetch-depth: 0
- name: Cache Scala, SBT and Maven
uses: actions/cache@v2
with:
path: |
build/apache-maven-*
build/scala-*
build/*.jar
~/.sbt
key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
restore-keys: |
build-
- name: Cache Coursier local repository
uses: actions/cache@v2
with:
path: ~/.cache/coursier
key: benchmark-coursier-${{ github.event.inputs.jdk }}-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
restore-keys: |
benchmark-coursier-${{ github.event.inputs.jdk }}
- name: Install Java ${{ github.event.inputs.jdk }}
uses: actions/setup-java@v1
with:
java-version: ${{ github.event.inputs.jdk }}
- name: Run benchmarks
run: |
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pspark-ganglia-lgpl test:package
# Make less noisy
cp conf/log4j.properties.template conf/log4j.properties
sed -i 's/log4j.rootCategory=INFO, console/log4j.rootCategory=WARN, console/g' conf/log4j.properties
# In benchmark, we use local as master so set driver memory only. Note that GitHub Actions has 7 GB memory limit.
bin/spark-submit \
--driver-memory 6g --class org.apache.spark.benchmark.Benchmarks \
--jars "`find . -name '*-SNAPSHOT-tests.jar' -o -name '*avro*-SNAPSHOT.jar' | paste -sd ',' -`" \
"`find . -name 'spark-core*-SNAPSHOT-tests.jar'`" \
"${{ github.event.inputs.class }}"
# To keep the directory structure and file permissions, tar them
# See also https://github.com/actions/upload-artifact#maintaining-file-permissions-and-case-sensitive-files
echo "Preparing the benchmark results:"
tar -cvf benchmark-results-${{ github.event.inputs.jdk }}.tar `git diff --name-only` `git ls-files --others --exclude-standard`
- name: Upload benchmark results
uses: actions/upload-artifact@v2
with:
name: benchmark-results-${{ github.event.inputs.jdk }}-${{ matrix.split }}
path: benchmark-results-${{ github.event.inputs.jdk }}.tar