[SPARK-33282] Migrate from deprecated probot autolabeler to GitHub labeler action

### What changes were proposed in this pull request?

This PR removes the old Probot Autolabeler labeling configuration, as the probot autolabeler has been deprecated. I've updated the configs in Iceberg and in Avro, and we also need to update here. This PR adds in an additional workflow for labeling PRs and migrates the old probot config to the new format. Unfortunately, because certain features have not been released upstream, we will not get the _exact_ behavior as before. I have documented where that is and what changes are neeeded, and in the associated ticket I've also discussed other options and why I think this is the best way to go. Definitely a follow up ticket is needed to get the original behavior back in these few cases, but PRs have not been labeled for almost a month and so it's probably best to get it right 95% of the time and occasionally have some UI related PRs labeled as `CORE` while the issue is resolved upstream and/or further investigated.

### Why are the changes needed?

The probot autolabeler is dead and will not be maintained going forward. This has been confirmed with github user [at]mithro in an issue in their repository.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

To test this PR, I first merged the config into my local fork. I then edited it several times and ran tests on that.

Unfortunately, I've overwritten my fork with the apache repo in order to create a proper PR. However, I've also added the config for the same thing in the Iceberg repo as well as the Avro repo.

I have now merged this PR into my local repo and will be running some tests on edge cases there and for validating in general:
- [Check that the SQL label is applied for changes directly below repo root's sql directory](https://github.com/kbendick/spark/pull/16) 
- [Check that the structured streaming label is applied](https://github.com/kbendick/spark/pull/20) 
- [Check that a wildcard at the end of a pattern will match nested files](https://github.com/kbendick/spark/pull/19) 
- [Check that the rule **/*pom.xml will match the root pom.xml file](https://github.com/kbendick/spark/pull/25) 

I've also discovered that we're likely not killing github actions that run (like large tests etc) when users push to their PR. In most cases, I see that a user has to mark something as "OK to test", but it still seems like we might want to discuss whether or not we should add a cancellation step In order to save time / capacity on the runners. If so desired, we would add an action in each workflow that cancels old runs when a `push` action occurs on a PR. This will likely make waiting for test runners much faster iff tests are automatically rerun on push by anybody (such as PMCs, PRs that have been marked OK to test, etc). We could free a large number of resources potentially if a cancellation step was added to all of the workflows in the Apache account (as github action API limits are set at the account level).

Admittedly, the fact that the "old" workflow runs weren't cancelled could admittedly be because of the fact that I was working in a fork, but given that there are explicit actions to be added to the start of workflows to cancel old PR workflows and given that we don't have them configured indicates to me that likely this is the case in this repo (and in most `apache` repos as well), at least under certain circumstances (e.g. repos that don't have "Ok to test"-like webhooks as one example).

This is a separate issue though, which I can bring up on the mailing list once I'm done with this PR. Unfortunately I've been very busy the past two weeks, but if somebody else wanted to work on that I would be happy to support with any knowledge I have.

The last Apache repo to still have the probot autolabeler in it is Beam, at which point we can have Gavin from ASF Infra remove the permissions for the probot autolabeler entirely. See the associated JIRA ticket for the links to other tickets, like the one for ASF Infra to remove the dead probot autolabeler's read and write permissions to our PRs in the Apache organization.

Closes #30244 from kbendick/begin-migration-to-github-labeler-action.

Authored-by: Kyle Bendickson <kjbendickson@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
Kyle Bendickson 2020-11-05 16:10:52 +09:00 committed by HyukjinKwon
parent 551b504cfe
commit 0535b34ad4
3 changed files with 195 additions and 133 deletions

View file

@ -1,133 +0,0 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Bot page: https://github.com/apps/probot-autolabeler
# The matching patterns follow the .gitignore spec.
# See: https://git-scm.com/docs/gitignore#_pattern_format
# Also, note that the plugin uses 'ignore' package. See also
# https://github.com/kaelzhang/node-ignore
INFRA:
- ".github/"
- "appveyor.yml"
- "/tools/"
- "/dev/create-release/"
- ".asf.yaml"
- ".gitattributes"
- ".gitignore"
- "/dev/github_jira_sync.py"
- "/dev/merge_spark_pr.py"
- "/dev/run-tests-jenkins*"
BUILD:
- "/dev/"
- "!/dev/github_jira_sync.py"
- "!/dev/merge_spark_pr.py"
- "!/dev/run-tests-jenkins*"
- "!/dev/.rat-excludes"
- "/build/"
- "/project/"
- "/assembly/"
- "*pom.xml"
- "/bin/docker-image-tool.sh"
- "/bin/find-spark-home*"
- "scalastyle-config.xml"
DOCS:
- "docs/"
- "/README.md"
- "/CONTRIBUTING.md"
EXAMPLES:
- "examples/"
- "/bin/run-example*"
CORE:
- "/core/"
- "!UI.scala"
- "!ui/"
- "/common/kvstore/"
- "/common/network-common/"
- "/common/network-shuffle/"
- "/python/pyspark/*.py"
- "/python/pyspark/tests/*.py"
SPARK SUBMIT:
- "/bin/spark-submit*"
SPARK SHELL:
- "/repl/"
- "/bin/spark-shell*"
SQL:
- "sql/"
- "/common/unsafe/"
- "!/python/pyspark/sql/avro/"
- "!/python/pyspark/sql/streaming.py"
- "!/python/pyspark/sql/tests/test_streaming.py"
- "/bin/spark-sql*"
- "/bin/beeline*"
- "/sbin/*thriftserver*.sh"
- "*SQL*.R"
- "DataFrame.R"
- "WindowSpec.R"
- "catalog.R"
- "column.R"
- "functions.R"
- "group.R"
- "schema.R"
- "types.R"
AVRO:
- "/external/avro/"
- "/python/pyspark/sql/avro/"
DSTREAM:
- "/streaming/"
- "/data/streaming/"
- "/external/flume*"
- "/external/kinesis*"
- "/external/kafka*"
- "/python/pyspark/streaming/"
GRAPHX:
- "/graphx/"
- "/data/graphx/"
ML:
- "ml/"
- "*mllib_*.R"
MLLIB:
- "spark/mllib/"
- "/mllib-local/"
- "/python/pyspark/mllib/"
STRUCTURED STREAMING:
- "sql/**/streaming/"
- "/external/kafka-0-10-sql/"
- "/python/pyspark/sql/streaming.py"
- "/python/pyspark/sql/tests/test_streaming.py"
- "*streaming.R"
PYTHON:
- "/bin/pyspark*"
- "python/"
R:
- "r/"
- "R/"
- "/bin/sparkR*"
YARN:
- "/resource-managers/yarn/"
MESOS:
- "/resource-managers/mesos/"
- "/sbin/*mesos*.sh"
KUBERNETES:
- "/resource-managers/kubernetes/"
WINDOWS:
- "*.cmd"
- "/R/pkg/tests/fulltests/test_Windows.R"
WEB UI:
- "ui/"
- "UI.scala"
DEPLOY:
- "/sbin/"

152
.github/labeler.yml vendored Normal file
View file

@ -0,0 +1,152 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
#
# Pull Request Labeler Github Action Configuration: https://github.com/marketplace/actions/labeler
#
# Note that we currently cannot use the negatioon operator (i.e. `!`) for miniglob matches as they
# would match any file that doesn't touch them. What's needed is the concept of `any `, which takes a
# list of constraints / globs and then matches all of the constraints for either `any` of the files or
# `all` of the files in the change set.
#
# However, `any`/`all` are not supported in a released version and testing off of the `main` branch
# resulted in some other errors when testing.
#
# An issue has been opened upstream requesting that a release be cut that has support for all/any:
# - https://github.com/actions/labeler/issues/111
#
# While we wait for this issue to be handled upstream, we can remove
# the negated / `!` matches for now and at least have labels again.
#
INFRA:
- ".github/**/*"
- "appveyor.yml"
- "tools/**/*"
- "dev/create-release/**/*"
- ".asf.yaml"
- ".gitattributes"
- ".gitignore"
- "dev/github_jira_sync.py"
- "dev/merge_spark_pr.py"
- "dev/run-tests-jenkins*"
BUILD:
# Can be supported when a stable release with correct all/any is released
#- any: ['dev/**/*', '!dev/github_jira_sync.py', '!dev/merge_spark_pr.py', '!dev/.rat-excludes']
- "dev/**/*"
- "build/**/*"
- "project/**/*"
- "assembly/**/*"
- "**/*pom.xml"
- "bin/docker-image-tool.sh"
- "bin/find-spark-home*"
- "scalastyle-config.xml"
# These can be added in the above `any` clause (and the /dev/**/* glob removed) when
# `any`/`all` support is released
# - "!dev/github_jira_sync.py"
# - "!dev/merge_spark_pr.py"
# - "!dev/run-tests-jenkins*"
# - "!dev/.rat-excludes"
DOCS:
- "docs/**/*"
- "**/README.md"
- "**/CONTRIBUTING.md"
EXAMPLES:
- "examples/**/*"
- "bin/run-example*"
# CORE needs to be updated when all/any are released upstream.
CORE:
# - any: ["core/**/*", "!**/*UI.scala", "!**/ui/**/*"] # If any file matches all of the globs defined in the list started by `any`, label is applied.
- "core/**/*"
- "common/kvstore/**/*"
- "common/network-common/**/*"
- "common/network-shuffle/**/*"
- "python/pyspark/**/*.py"
- "python/pyspark/tests/**/*.py"
SPARK SUBMIT:
- "bin/spark-submit*"
SPARK SHELL:
- "repl/**/*"
- "bin/spark-shell*"
SQL:
#- any: ["**/sql/**/*", "!python/pyspark/sql/avro/**/*", "!python/pyspark/sql/streaming.py", "!python/pyspark/sql/tests/test_streaming.py"]
- "**/sql/**/*"
- "common/unsafe/**/*"
#- "!python/pyspark/sql/avro/**/*"
#- "!python/pyspark/sql/streaming.py"
#- "!python/pyspark/sql/tests/test_streaming.py"
- "bin/spark-sql*"
- "bin/beeline*"
- "sbin/*thriftserver*.sh"
- "**/*SQL*.R"
- "**/DataFrame.R"
- "**/*WindowSpec.R"
- "**/*catalog.R"
- "**/*column.R"
- "**/*functions.R"
- "**/*group.R"
- "**/*schema.R"
- "**/*types.R"
AVRO:
- "external/avro/**/*"
- "python/pyspark/sql/avro/**/*"
DSTREAM:
- "streaming/**/*"
- "data/streaming/**/*"
- "external/kinesis*"
- "external/kafka*"
- "python/pyspark/streaming/**/*"
GRAPHX:
- "graphx/**/*"
- "data/graphx/**/*"
ML:
- "**/ml/**/*"
- "**/*mllib_*.R"
MLLIB:
- "**/spark/mllib/**/*"
- "mllib-local/**/*"
- "python/pyspark/mllib/**/*"
STRUCTURED STREAMING:
- "**/sql/**/streaming/**/*"
- "external/kafka-0-10-sql/**/*"
- "python/pyspark/sql/streaming.py"
- "python/pyspark/sql/tests/test_streaming.py"
- "**/*streaming.R"
PYTHON:
- "bin/pyspark*"
- "**/python/**/*"
R:
- "**/r/**/*"
- "**/R/**/*"
- "bin/sparkR*"
YARN:
- "resource-managers/yarn/**/*"
MESOS:
- "resource-managers/mesos/**/*"
- "sbin/*mesos*.sh"
KUBERNETES:
- "resource-managers/kubernetes/**/*"
WINDOWS:
- "**/*.cmd"
- "R/pkg/tests/fulltests/test_Windows.R"
WEB UI:
- "**/ui/**/*"
- "**/*UI.scala"
DEPLOY:
- "sbin/**/*"

43
.github/workflows/labeler.yml vendored Normal file
View file

@ -0,0 +1,43 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
name: "Pull Request Labeler"
on: pull_request_target
jobs:
label:
runs-on: ubuntu-latest
steps:
# In order to get back the negated matches like in the old config,
# we need the actinons/labeler concept of `all` and `any` which matches
# all of the given constraints / glob patterns for either `all`
# files or `any` file in the change set.
#
# Github issue which requests a timeline for a release with any/all support:
# - https://github.com/actions/labeler/issues/111
# This issue also references the issue that mentioned that any/all are only
# supported on main branch (previously called master):
# - https://github.com/actions/labeler/issues/73#issuecomment-639034278
#
# However, these are not in a published release and the current `main` branch
# has some issues upon testing.
- uses: actions/labeler@2.2.0
with:
repo-token: "${{ secrets.GITHUB_TOKEN }}"
sync-labels: true