### What changes were proposed in this pull request?
This PR aims to add a fallback Maven repository when a mirror to `central` fail.
### Why are the changes needed?
We use `Google Maven Central` in GitHub Action as a mirror of `central`.
However, `Google Maven Central` sometimes doesn't have newly published artifacts
and there is no guarantee when we get the newly published artifacts.
By duplicating `Maven Central` with a new ID, we can add a fallback Maven repository
which is not mirrored by `Google Maven Central`.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually testing with the new `Twitter` chill artifacts by switching `chill.version` from `0.9.3` to `0.9.5`.
```
$ rm -rf ~/.m2/repository/com/twitter/chill*
$ mvn compile | grep chill
Downloading from google-maven-central: https://maven-central.storage-download.googleapis.com/repos/central/data/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom
Downloading from central_without_mirror: https://repo.maven.apache.org/maven2/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom
Downloaded from central_without_mirror: https://repo.maven.apache.org/maven2/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom (2.8 kB at 11 kB/s)
```
Closes#27281 from dongjoon-hyun/SPARK-30572.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Follow-on to #26877.
### What changes were proposed in this pull request?
This PR tweaks the stale PR message to [clarify](https://github.com/apache/spark/pull/24457#issuecomment-571393900) the procedure for reopening a PR after it has been marked as stale.
### Why are the changes needed?
This change should clarify the reopening process for contributors.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A
Closes#27114 from nchammas/SPARK-30173-stale-tweaks.
Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR adds [a GitHub workflow to automatically close stale PRs](https://github.com/marketplace/actions/close-stale-issues).
### Why are the changes needed?
This will help cut down the number of open but stale PRs and keep the PR queue manageable.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
I'm not sure how to test this PR without impacting real PRs on the repo.
See: https://github.com/actions/stale/issues/32Closes#26877 from nchammas/SPARK-30173-stale-prs.
Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR is a follow-up of https://github.com/apache/spark/pull/26793 and aims to initialize `~/.m2` directory.
### Why are the changes needed?
In case of cache reset, `~/.m2` directory doesn't exist. It causes a failure.
- `master` branch has a cache as of now. So, we missed this.
- `branch-2.4` has no cache as of now, and we hit this failure.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This PR is tested against personal `branch-2.4`.
- https://github.com/dongjoon-hyun/spark/pull/12Closes#26794 from dongjoon-hyun/SPARK-30163-2.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to use [Google Maven mirror](https://cloudplatform.googleblog.com/2015/11/faster-builds-for-Java-developers-with-Maven-Central-mirror.html) in `GitHub Action` jobs to improve the stability.
```xml
<settings>
<mirrors>
<mirror>
<id>google-maven-central</id>
<name>GCS Maven Central mirror</name>
<url>https://maven-central.storage-download.googleapis.com/repos/central/data/</url>
<mirrorOf>central</mirrorOf>
</mirror>
</mirrors>
</settings>
```
### Why are the changes needed?
Although we added Maven cache inside `GitHub Action`, the timeouts happen too frequently during access `artifact descriptor`.
```
[ERROR] Failed to execute goal on project spark-mllib_2.12:
...
Failed to read artifact descriptor for ...
...
Connection timed out (Read failed) -> [Help 1]
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This PR is irrelevant to Jenkins.
This is tested on the personal repository first. `GitHub Action` of this PR should pass.
- https://github.com/dongjoon-hyun/spark/pull/11Closes#26793 from dongjoon-hyun/SPARK-30163.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR enables JDK11 build with `hadoop-2.7` profile at `GitHub Action`.
**BEFORE (6 jobs including one JDK11 job)**
![before](https://user-images.githubusercontent.com/9700541/70342731-7763f300-180a-11ea-859f-69038b88451f.png)
**AFTER (7 jobs including two JDK11 jobs)**
![after](https://user-images.githubusercontent.com/9700541/70342658-54d1da00-180a-11ea-9fba-507fc087dc62.png)
### Why are the changes needed?
SPARK-29957 makes JDK11 test work with `hadoop-2.7` profile. We need to protect it.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This is `GitHub Action` only PR. See the result of `GitHub Action` on this PR.
Closes#26782 from dongjoon-hyun/SPARK-GHA-HADOOP-2.7.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR adds `GitHub Action Cache` task on `build` directory.
### Why are the changes needed?
This will replace the Maven downloading with the cache.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually check the GitHub Action log of this PR.
Closes#26652 from dongjoon-hyun/SPARK-MAVEN-CACHE.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims the followings.
- Add two profiles, `hive-1.2` and `hive-2.3` (default)
- Validate if we keep the existing combination at least. (Hadoop-2.7 + Hive 1.2 / Hadoop-3.2 + Hive 2.3).
For now, we assumes that `hive-1.2` is explicitly used with `hadoop-2.7` and `hive-2.3` with `hadoop-3.2`. The followings are beyond the scope of this PR.
- SPARK-29988 Adjust Jenkins jobs for `hive-1.2/2.3` combination
- SPARK-29989 Update release-script for `hive-1.2/2.3` combination
- SPARK-29991 Support `hive-1.2/2.3` in PR Builder
### Why are the changes needed?
This will help to switch our dependencies to update the exposed dependencies.
### Does this PR introduce any user-facing change?
This is a dev-only change that the build profile combinations are changed.
- `-Phadoop-2.7` => `-Phadoop-2.7 -Phive-1.2`
- `-Phadoop-3.2` => `-Phadoop-3.2 -Phive-2.3`
### How was this patch tested?
Pass the Jenkins with the dependency check and tests to make it sure we don't change anything for now.
- [Jenkins (-Phadoop-2.7 -Phive-1.2)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull)
- [Jenkins (-Phadoop-3.2 -Phive-2.3)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull)
Also, from now, GitHub Action validates the following combinations.
![gha](https://user-images.githubusercontent.com/9700541/69355365-822d5e00-0c36-11ea-93f7-e00e5459e1d0.png)
Closes#26619 from dongjoon-hyun/SPARK-29981.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to cache `~/.m2/repository/net` and `~/.m2/repository/io` to reduce the flakiness.
### Why are the changes needed?
This will stabilize GitHub Action more before adding `hive-1.2` and `hive-2.3` combination.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
After the GitHub Action on this PR passes, check the log.
Closes#26621 from dongjoon-hyun/SPARK-GHA-CACHE.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Linter (R) github workflows failed sometimes like:
https://github.com/apache/spark/pull/26509/checks?check_run_id=310718016
Failed message:
```
Executing: /tmp/apt-key-gpghome.8r74rQNEjj/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
gpg: connecting dirmngr at '/tmp/apt-key-gpghome.8r74rQNEjj/S.dirmngr' failed: IPC connect call failed
gpg: keyserver receive failed: No dirmngr
##[error]Process completed with exit code 2.
```
It is due to a buggy GnuPG. Context:
https://github.com/sbt/website/pull/825https://github.com/sbt/sbt/issues/4261https://github.com/microsoft/WSL/issues/3286
### Why are the changes needed?
Make lint-r github workflows work.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Pass github workflows.
Closes#26602 from viirya/SPARK-29964.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR fixes SparkR lint errors and adds `lint-r` GitHub Action to protect the branch.
### Why are the changes needed?
It turns out that we currently don't run it. It's recovered yesterday. However, after that, our Jenkins linter jobs (`master`/`branch-2.4`) has been broken on `lint-r` tasks.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the GitHub Action on this PR in addition to Jenkins R and AppVeyor R.
Closes#26564 from dongjoon-hyun/SPARK-29936.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to enable [GitHub Action Cache on Maven local repository](https://github.com/actions/cache/blob/master/examples.md#java---maven) for the following goals.
1. To reduce the chance of failure due to the Maven download flakiness.
2. To speed up the build a little bit.
Unfortunately, due to the GitHub Action Cache limitation, it seems that we cannot put all into a single cache. It's ignored like the following.
- **.m2/repository is 680777194 bytes**
```
/bin/tar -cz -f /home/runner/work/_temp/01f162c3-0c78-4772-b3de-b619bb5d7721/cache.tgz -C /home/runner/.m2/repository .
3
##[warning]Cache size of 680777194 bytes is over the 400MB limit, not saving cache.
```
### Why are the changes needed?
Not only for the speed up, but also for reducing the Maven download flakiness, we had better enable caching on local maven repository. The followings are the failure examples in these days.
- https://github.com/apache/spark/runs/295869450
```
[ERROR] Failed to execute goal on project spark-streaming-kafka-0-10_2.12:
Could not resolve dependencies for project org.apache.spark:spark-streaming-kafka-0-10_2.12🫙spark-367716:
Could not transfer artifact com.fasterxml.jackson.datatype:jackson-datatype-jdk8:jar:2.10.0
from/to central (https://repo.maven.apache.org/maven2):
Failed to transfer file https://repo.maven.apache.org/maven2/com/fasterxml/jackson/datatype/
jackson-datatype-jdk8/2.10.0/jackson-datatype-jdk8-2.10.0.jar with status code 503 -> [Help 1]
...
[ERROR] mvn <args> -rf :spark-streaming-kafka-0-10_2.12
```
```
[ERROR] Failed to execute goal on project spark-tools_2.12:
Could not resolve dependencies for project org.apache.spark:spark-tools_2.12🫙3.0.0-SNAPSHOT:
Failed to collect dependencies at org.clapper:classutil_2.12🫙1.5.1:
Failed to read artifact descriptor for org.clapper:classutil_2.12🫙1.5.1:
Could not transfer artifact org.clapper:classutil_2.12:pom:1.5.1 from/to central (https://repo.maven.apache.org/maven2):
Connection timed out (Read failed) -> [Help 1]
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually check the GitHub Action log of this PR.
```
Cache restored from key: 1.8-hadoop-2.7-maven-com-5b4a9fb13c5f5ff78e65a20003a3810796e4d1fde5f24d397dfe6e5153960ce4
Cache restored from key: 1.8-hadoop-2.7-maven-org-5b4a9fb13c5f5ff78e65a20003a3810796e4d1fde5f24d397dfe6e5153960ce4
```
Closes#26456 from dongjoon-hyun/SPARK-29820.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR change the Github Actions build command from `mvn package` to `mvn install` to build Scaladoc jars.
### Why are the changes needed?
Sometimes `mvn install` build failure with error: `not found: type ClassName...`.
More details: https://github.com/apache/spark/pull/24628#issuecomment-495655747
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A
Closes#26414 from wangyum/github-action-install.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to add linters and license/dependency checkers to GitHub Action. This excludes `lint-r` intentionally because https://github.com/actions/setup-r is not ready. We can add that later when it becomes available.
### Why are the changes needed?
This will help the PR reviews.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
See the GitHub Action result on this PR.
Closes#25879 from dongjoon-hyun/SPARK-29199.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR use `java-version` instead of `version` for GitHub Action. More details:
204b974cf4ac25aeee3a
### Why are the changes needed?
The `version` property will not be supported after October 1, 2019.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
N/A
Closes#25866 from wangyum/java-version.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to increase the JVM CodeCacheSize from 0.5G to 1G.
### Why are the changes needed?
After upgrading to `Scala 2.12.10`, the following is observed during building.
```
2019-09-18T20:49:23.5030586Z OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
2019-09-18T20:49:23.5032920Z OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
2019-09-18T20:49:23.5034959Z CodeCache: size=524288Kb used=521399Kb max_used=521423Kb free=2888Kb
2019-09-18T20:49:23.5035472Z bounds [0x00007fa62c000000, 0x00007fa64c000000, 0x00007fa64c000000]
2019-09-18T20:49:23.5035781Z total_blobs=156549 nmethods=155863 adapters=592
2019-09-18T20:49:23.5036090Z compilation: disabled (not enough contiguous free space left)
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually check the Jenkins or GitHub Action build log (which should not have the above).
Closes#25836 from dongjoon-hyun/SPARK-CODE-CACHE-1G.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Until now, we are testing JDK8/11 with Hadoop-3.2. This PR aims to extend the test coverage for JDK8/Hadoop-2.7.
### Why are the changes needed?
This will prevent Hadoop 2.7 compile/package issues at PR stage.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
GitHub Action on this PR shows all three combinations now. And, this is irrelevant to Jenkins test.
Closes#25824 from dongjoon-hyun/SPARK-29125.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR enables GitHub Action on PRs.
### Why are the changes needed?
So far, we detect JDK11 compilation error after merging.
This PR aims to prevent JDK11 compilation error at PR stage.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manual. See the GitHub Action on this PR.
Closes#25786 from dongjoon-hyun/SPARK-29079.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
This patch adds java version parameter to GitHub workflow conf for JDK8/11.
As we want to build JDK8/11 on GitHub workflow, we might need to add java version according current matrix.
No
See the GitHub workflow run result.
Closes#25625 from viirya/github-workflow-java.
Authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to add `-Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-3.2 -Phadoop-cloud` profiles to GitHub workflow conf.
### Why are the changes needed?
Currently, we build with JDK8 and test with JDK8/11 in Jenkins.
And, we use GitHub Workflow for JDK8/JDK11 building test.
To test JDK11 fully, we need to enable `hive` and `hadoop-3.2` profiles for `Hive 2.3.6` and `Hadoop 3.2`. Also, this PR adds all resource manager modules, too.
### Does this PR introduce any user-facing change?
No. In addition, Jenkins workload will be the same because this is specific to GitHub workflow.
### How was this patch tested?
See the GitHub workflow run result.
Closes#25624 from dongjoon-hyun/SPARK-JDK11-HIVE.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Make Github Actions log quieter
Closes#25468 from dbtsai/actions2.
Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
## What changes were proposed in this pull request?
Add JDK11 for Github Actions
Closes#25444 from dbtsai/jdk11.
Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
## What changes were proposed in this pull request?
Github now provides free CI/CD for build, test, and deploy. This PR enables a simple Github Actions to build master with JDK8 with latest Ubuntu. We can extend it with different versions of JDK, and even build Spark with docker images in the future.
Closes#25440 from dbtsai/actions.
Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>