### What changes were proposed in this pull request?
This is kind of a followup of https://github.com/apache/spark/pull/31104 but I decided to track it separately with a separate JIRA.
Currently the jobs are being canceled in main repo branches. If a commit is merged, for example, to master branch before the test finishes, it cancels the previous builds. This is a problem because we cannot, for example, detect logical conflict properly. We should only cancel the jobs in PRs:
![Screen Shot 2021-01-11 at 3 22 24 PM](https://user-images.githubusercontent.com/6477701/104152015-c7f04b80-5421-11eb-9e40-6b0a0e5b8442.png)
This PR proposes to don't do this in the main repo branch commits but only do it in PRs.
### Why are the changes needed?
- To keep the test coverage
- To run the test in the synced master branch instead of relying on the builds made in each PR with an outdated master branch
- To detect test failures from logical conflicts from merging two conflicting PRs at the same time.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
I manually tested in
- https://github.com/HyukjinKwon/spark/pull/27
- https://github.com/HyukjinKwon/spark/pull/28
I added Yi Wu as a co-author since he helped verifying the current fix in the PR above.
I checked that it does not cancel in the main repo branch:
![Screen Shot 2021-01-11 at 3 58 52 PM](https://user-images.githubusercontent.com/6477701/104153656-3afbc100-5426-11eb-9309-85f6f4fd9ff3.png)
I checked it cancels in PRs:
![Screen Shot 2021-01-11 at 3 58 45 PM](https://user-images.githubusercontent.com/6477701/104153658-3d5e1b00-5426-11eb-89f7-786c3ae6849a.png)
Closes#31121 from HyukjinKwon/SPARK-34065.
Lead-authored-by: hyukjinkwon <gurwls223@apache.org>
Co-authored-by: yi.wu <yi.wu@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Similar to: https://github.com/apache/spark/pull/31098https://github.com/apache/calcite/pull/2318 (solution suggestted by vlsi - https://github.com/apache/pulsar/issues/9154#issuecomment-756984731)
I used the action, which was maintained by potiuk instead of the original author, for two reasons:
- the original action was abandoned and is not supported (Proof: https://github.com/n1hility/cancel-previous-runs/issues/7)
- this action works with forks. The original action only worked when the contribution was run in the same repository and the action had a token with full accesses.
> If you use forks, you should create a separate "Cancelling" workflow_run triggered workflow. The workflow_run should be responsible for all canceling actions. The examples below show the possible ways the action can be utilized.
### What changes were proposed in this pull request?
This PR aims to reduce the GitHub Action usage by cancelling the previous build.
### Why are the changes needed?
In most case, the last commit is meaningful.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Due to the nature of the change, testing of this change is difficult.
> Note: This event will only trigger a workflow run if the workflow file is on the default branch.
https://docs.github.com/en/free-pro-teamlatest/actions/reference/events-that-trigger-workflows#workflow_run
However, you can see on my fork that this action is triggered.
https://github.com/mik-laj/spark/actions?query=workflow%3A%22Cancelling+Duplicates%22
I also asked the author of this action to review this change - potiuk (PMC of Apache Airflow) and I have a positive review.
Closes#31104 from mik-laj/patch-1.
Lead-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to recover GitHub Action `build_and_test` job.
### Why are the changes needed?
Currently, `build_and_test` job fails to start because of the following in master/branch-3.1 at least.
```
r-lib/actions/setup-rv1 is not allowed to be used in apache/spark.
Actions in this workflow must be: created by GitHub, verified in the GitHub Marketplace,
within a repository owned by apache or match the following:
adoptopenjdk/*, apache/*, gradle/wrapper-validation-action.
```
- https://github.com/apache/spark/actions/runs/449826457
![Screen Shot 2020-12-28 at 10 06 11 PM](https://user-images.githubusercontent.com/9700541/103262174-f1f13a80-4958-11eb-8ceb-631527155775.png)
### Does this PR introduce _any_ user-facing change?
No. This is a test infra.
### How was this patch tested?
To check GitHub Action `build_and_test` job on this PR.
Closes#30959 from dongjoon-hyun/SPARK-33931.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR adds `-Pspark-ganglia-lgpl` to the build definition with Scala 2.13 on GitHub Actions.
### Why are the changes needed?
Keep the code build-able with Scala 2.13.
With this change, all the sub-modules seems to be built-able with Scala 2.13.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I confirmed Scala 2.13 build pass with the following command.
```
$ ./dev/change-scala-version.sh 2.13
$ build/sbt -Pspark-ganglia-lgpl -Pscala-2.13 compile test:compile
```
Closes#30834 from sarutak/ganglia-scala-2.13.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes a better solution for the R build failure on GitHub Actions.
The issue is solved in #30737 but I noticed the following two things.
* We can use the latest `usethis` if we install additional libraries on the GitHub Actions environment.
* For tests on AppVeyor, `usethis` is not necessary, so I partially revert the previous change.
### Why are the changes needed?
For more simple solution.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Confirmed on GitHub Actions and AppVeyor on my account.
Closes#30753 from sarutak/followup-SPARK-33757.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR fixes the R dependencies build error on GitHub Actions and AppVeyor.
The reason seems that `usethis` package is updated 2020/12/10.
https://cran.r-project.org/web/packages/usethis/index.html
### Why are the changes needed?
To keep the build clean.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Should be done by GitHub Actions.
Closes#30737 from sarutak/fix-r-dependencies-build-error.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR adds `kubernetes-integration-tests` to GitHub Actions for Scala 2.13 build.
### Why are the changes needed?
Now that the build pass with `kubernetes-integration-tests` and Scala 2.13, it's better to keep it build-able.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Should be done by GitHub Actions.
I also confirmed that the build passes with the following command.
```
$ build/sbt -Pscala-2.13 -Pkubernetes -Pkubernetes-integration-tests compile test:compile
```
Closes#30731 from sarutak/github-actions-k8s.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This aims to add `-Pdocker-integration-tests` at GitHub Action job for Scala 2.13 compilation.
### Why are the changes needed?
We fixed Scala 2.13 compilation of this module at https://github.com/apache/spark/pull/30660 . This PR will prevent accidental regression at that module.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GitHub Action Scala 2.13 job.
Closes#30661 from dongjoon-hyun/SPARK-DOCKER-IT.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
### What changes were proposed in this pull request?
This PR removes `-Djava.version=11` from the build command for Scala 2.13 in the GitHub Actions' job.
In the GitHub Actions' job, the build command for Scala 2.13 is defined as follows.
```
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Djava.version=11 -Pscala-2.13 compile test:compile
```
Though, Scala 2.13 build uses Java 8 rather than 11 so let's remove `-Djava.version=11`.
### Why are the changes needed?
To build with consistent configuration.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Should be done by GitHub Actions' workflow.
Closes#30633 from sarutak/scala-213-java11.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Currently, GitHub Action is using two docker images.
```
$ git grep dongjoon/apache-spark-github-action-image
.github/workflows/build_and_test.yml: image: dongjoon/apache-spark-github-action-image:20201015
.github/workflows/build_and_test.yml: image: dongjoon/apache-spark-github-action-image:20201025
```
This PR aims to make it consistent by using the latest one.
```
- image: dongjoon/apache-spark-github-action-image:20201015
+ image: dongjoon/apache-spark-github-action-image:20201025
```
### Why are the changes needed?
This is for better maintainability. The image size is almost the same.
```
$ docker images | grep 202010
dongjoon/apache-spark-github-action-image 20201025 37adfa3d226a 5 weeks ago 2.18GB
dongjoon/apache-spark-github-action-image 20201015 ff6fee8dc36d 6 weeks ago 2.16GB
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the GitHub Action.
Closes#30578 from dongjoon-hyun/SPARK-MINOR.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes:
- Add `~/.sbt` directory into the build cache, see also https://github.com/sbt/sbt/issues/3681
- Move `hadoop-2` below to put up together with `java-11` and `scala-213`, see https://github.com/apache/spark/pull/30391#discussion_r524881430
- Remove unnecessary `.m2` cache if you run SBT tests only.
- Remove `rm ~/.m2/repository/org/apache/spark`. If you don't `sbt publishLocal` or `mvn install`, we don't need to care about it.
- Use Java 8 in Scala 2.13 build. We can switch the Java version to 11 used for release later.
- Add caches into linters. The linter scripts uses `sbt` in, for example, `./dev/lint-scala`, and uses `mvn` in, for example, `./dev/lint-java`. Also, it requires to `sbt package` in Jekyll build, see: https://github.com/apache/spark/blob/master/docs/_plugins/copy_api_dirs.rb#L160-L161. We need full caches here for SBT, Maven and build tools.
- Use the same syntax of Java version, 1.8 -> 8.
### Why are the changes needed?
- Remove unnecessary stuff
- Cache what we can in the build
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
It will be tested in GitHub Actions build at the current PR
Closes#30391 from HyukjinKwon/SPARK-33464.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to protect `Hadoop 2.x` profile compilation in Apache Spark 3.1+.
### Why are the changes needed?
Since Apache Spark 3.1+ switch our default profile to Hadoop 3, we had better prevent at least compilation error with `Hadoop 2.x` profile at the PR review phase. Although this is an additional workload, it will finish quickly because it's compilation only.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the GitHub Action.
- This should be merged after https://github.com/apache/spark/pull/30375 .
Closes#30378 from dongjoon-hyun/SPARK-33454.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to decrease the parallelism of `SQL` module like `Hive` module.
### Why are the changes needed?
GitHub Action `sql - slow tests` become flaky.
- https://github.com/apache/spark/runs/1393670291
- https://github.com/apache/spark/runs/1393088031
### Does this PR introduce _any_ user-facing change?
No. This is dev-only feature.
Although this will increase the running time, but it's better than flakiness.
### How was this patch tested?
Pass the GitHub Action stably.
Closes#30365 from dongjoon-hyun/SPARK-33439.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR change the behavior of GitHub Actions job that caches dependencies.
SPARK-33226 upgraded sbt to 1.4.1.
As of 1.3.0, sbt uses Coursier as the dependency resolver / fetcher.
So let's change the dependency cache configuration for the GitHub Actions job.
### Why are the changes needed?
To make build faster with Coursier for the GitHub Actions job.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Should be done by GitHub Actions itself.
Closes#30259 from sarutak/coursier-cache.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR removes the old Probot Autolabeler labeling configuration, as the probot autolabeler has been deprecated. I've updated the configs in Iceberg and in Avro, and we also need to update here. This PR adds in an additional workflow for labeling PRs and migrates the old probot config to the new format. Unfortunately, because certain features have not been released upstream, we will not get the _exact_ behavior as before. I have documented where that is and what changes are neeeded, and in the associated ticket I've also discussed other options and why I think this is the best way to go. Definitely a follow up ticket is needed to get the original behavior back in these few cases, but PRs have not been labeled for almost a month and so it's probably best to get it right 95% of the time and occasionally have some UI related PRs labeled as `CORE` while the issue is resolved upstream and/or further investigated.
### Why are the changes needed?
The probot autolabeler is dead and will not be maintained going forward. This has been confirmed with github user [at]mithro in an issue in their repository.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
To test this PR, I first merged the config into my local fork. I then edited it several times and ran tests on that.
Unfortunately, I've overwritten my fork with the apache repo in order to create a proper PR. However, I've also added the config for the same thing in the Iceberg repo as well as the Avro repo.
I have now merged this PR into my local repo and will be running some tests on edge cases there and for validating in general:
- [Check that the SQL label is applied for changes directly below repo root's sql directory](https://github.com/kbendick/spark/pull/16) ✅
- [Check that the structured streaming label is applied](https://github.com/kbendick/spark/pull/20) ✅
- [Check that a wildcard at the end of a pattern will match nested files](https://github.com/kbendick/spark/pull/19) ✅
- [Check that the rule **/*pom.xml will match the root pom.xml file](https://github.com/kbendick/spark/pull/25) ✅
I've also discovered that we're likely not killing github actions that run (like large tests etc) when users push to their PR. In most cases, I see that a user has to mark something as "OK to test", but it still seems like we might want to discuss whether or not we should add a cancellation step In order to save time / capacity on the runners. If so desired, we would add an action in each workflow that cancels old runs when a `push` action occurs on a PR. This will likely make waiting for test runners much faster iff tests are automatically rerun on push by anybody (such as PMCs, PRs that have been marked OK to test, etc). We could free a large number of resources potentially if a cancellation step was added to all of the workflows in the Apache account (as github action API limits are set at the account level).
Admittedly, the fact that the "old" workflow runs weren't cancelled could admittedly be because of the fact that I was working in a fork, but given that there are explicit actions to be added to the start of workflows to cancel old PR workflows and given that we don't have them configured indicates to me that likely this is the case in this repo (and in most `apache` repos as well), at least under certain circumstances (e.g. repos that don't have "Ok to test"-like webhooks as one example).
This is a separate issue though, which I can bring up on the mailing list once I'm done with this PR. Unfortunately I've been very busy the past two weeks, but if somebody else wanted to work on that I would be happy to support with any knowledge I have.
The last Apache repo to still have the probot autolabeler in it is Beam, at which point we can have Gavin from ASF Infra remove the permissions for the probot autolabeler entirely. See the associated JIRA ticket for the links to other tickets, like the one for ASF Infra to remove the dead probot autolabeler's read and write permissions to our PRs in the Apache organization.
Closes#30244 from kbendick/begin-migration-to-github-labeler-action.
Authored-by: Kyle Bendickson <kjbendickson@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to initiate the migration to NumPy documentation style (from reST style) in PySpark docstrings.
This PR also adds one migration example of `SparkContext`.
- **Before:**
...
![Screen Shot 2020-10-26 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/97161090-a8ea0200-17c0-11eb-8204-0e70d18fc571.png)
...
![Screen Shot 2020-10-26 at 7 02 09 PM](https://user-images.githubusercontent.com/6477701/97161100-aab3c580-17c0-11eb-92ad-f5ad4441ce16.png)
...
- **After:**
...
![Screen Shot 2020-10-26 at 7 24 08 PM](https://user-images.githubusercontent.com/6477701/97161219-d636b000-17c0-11eb-80ab-d17a570ecb4b.png)
...
See also https://numpydoc.readthedocs.io/en/latest/format.html
### Why are the changes needed?
There are many reasons for switching to NumPy documentation style.
1. Arguably reST style doesn't fit well when the docstring grows large because it provides (arguably) less structures and syntax.
2. NumPy documentation style provides a better human readable docstring format. For example, notebook users often just do `help(...)` by `pydoc`.
3. NumPy documentation style is pretty commonly used in data science libraries, for example, pandas, numpy, Dask, Koalas,
matplotlib, ... Using NumPy documentation style can give users a consistent documentation style.
### Does this PR introduce _any_ user-facing change?
The dependency itself doesn't change anything user-facing.
The documentation change in `SparkContext` does, as shown above.
### How was this patch tested?
Manually tested via running `cd python` and `make clean html`.
Closes#30149 from HyukjinKwon/SPARK-33243.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to use a pre-built image for Github Action SparkR job.
### Why are the changes needed?
This will reduce the execution time and the flakiness.
**BEFORE (21 minutes 39 seconds)**
![Screen Shot 2020-10-16 at 1 24 43 PM](https://user-images.githubusercontent.com/9700541/96305593-fbeada80-0fb2-11eb-9b8e-86d8abaad9ef.png)
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the GitHub Action `sparkr` job in this PR.
Closes#30066 from dongjoon-hyun/SPARKR.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Add an environment variable `PYARROW_IGNORE_TIMEZONE` to pyspark tests in run-tests.py to use legacy nested timestamp behavior. This means that when converting arrow to pandas, nested timestamps with timezones will have the timezone localized during conversion.
### Why are the changes needed?
The default behavior was changed in PyArrow 2.0.0 to propagate timezone information. Using the environment variable enables testing with newer versions of pyarrow until the issue can be fixed in SPARK-32285.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests
Closes#30111 from BryanCutler/arrow-enable-legacy-nested-timestamps-SPARK-33189.
Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
PyArrow is uploaded into PyPI today (https://pypi.org/project/pyarrow/), and some tests fail with PyArrow 2.0.0+:
```
======================================================================
ERROR [0.774s]: test_grouped_over_window_with_key (pyspark.sql.tests.test_pandas_grouped_map.GroupedMapInPandasTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/__w/spark/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 595, in test_grouped_over_window_with_key
.select('id', 'result').collect()
File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 588, in collect
sock_info = self._jdf.collectToPython()
File "/__w/spark/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/__w/spark/spark/python/pyspark/sql/utils.py", line 117, in deco
raise converted from None
pyspark.sql.utils.PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 601, in main
process()
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 593, in process
serializer.dump_stream(out_iter, outfile)
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 255, in dump_stream
return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream)
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 81, in dump_stream
for batch in iterator:
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 248, in init_stream_yield_batches
for series in iterator:
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 426, in mapper
return f(keys, vals)
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 170, in <lambda>
return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))]
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 158, in wrapped
result = f(key, pd.concat(value_series, axis=1))
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 68, in wrapper
return f(*args, **kwargs)
File "/__w/spark/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 590, in f
"{} != {}".format(expected_key[i][1], window_range)
AssertionError: {'start': datetime.datetime(2018, 3, 15, 0, 0), 'end': datetime.datetime(2018, 3, 20, 0, 0)} != {'start': datetime.datetime(2018, 3, 15, 0, 0, tzinfo=<StaticTzInfo 'Etc/UTC'>), 'end': datetime.datetime(2018, 3, 20, 0, 0, tzinfo=<StaticTzInfo 'Etc/UTC'>)}
```
https://github.com/apache/spark/runs/1278917457
This PR proposes to set the upper bound of PyArrow in GitHub Actions build. This should be removed when we properly support PyArrow 2.0.0+ (SPARK-33189).
### Why are the changes needed?
To make build pass.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
GitHub Actions in this build will test it out.
Closes#30098 from HyukjinKwon/hot-fix-test.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Add MyPy to the CI. Once this is installed on the CI: https://issues.apache.org/jira/browse/SPARK-32797?jql=project%20%3D%20SPARK%20AND%20text%20~%20mypy this wil automatically check the types.
### Why are the changes needed?
We should check if the types are still correct on the CI.
```
MacBook-Pro-van-Fokko:spark fokkodriesprong$ ./dev/lint-python
starting python compilation test...
python compilation succeeded.
starting pycodestyle test...
pycodestyle checks passed.
starting flake8 test...
flake8 checks passed.
starting mypy test...
mypy checks passed.
The sphinx-build command was not found. Skipping Sphinx build for now.
all lint-python tests passed!
```
### Does this PR introduce _any_ user-facing change?
No :)
### How was this patch tested?
By running `./dev/lint-python` locally.
Closes#30088 from Fokko/SPARK-17333.
Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR proposes to install `scipy` as well in PyPy. It will test several ML specific test cases in PyPy as well. For example, 31a16fbb40/python/pyspark/mllib/tests/test_linalg.py (L487)
It was not installed when GitHub Actions build was added because it failed to install for an unknown reason. Seems like it's fixed in the latest scipy.
### Why are the changes needed?
To improve test coverage.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
GitHub Actions build in this PR will test it out.
Closes#30054 from HyukjinKwon/SPARK-32247.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
SPARK-32926 added a build test to GitHub Action for Scala 2.13 but it's only with Maven.
As SPARK-32873 reported, some compilation error happens only with SBT so I think we need to add another build test to GitHub Action for SBT.
Unfortunately, we don't have abundant resources for GitHub Actions so instead of just adding the new SBT job, let's replace the existing Maven job with the new SBT job for Scala 2.13.
### Why are the changes needed?
To ensure build test passes even with SBT for Scala 2.13.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GitHub Actions' job.
Closes#29958 from sarutak/add-sbt-job-for-scala-2.13.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade `Github Action` runner image from `Ubuntu 18.04 (LTS)` to `Ubuntu 20.04 (LTS)`.
### Why are the changes needed?
`ubuntu-latest` in `GitHub Action` is still `Ubuntu 18.04 (LTS)`.
- https://github.com/actions/virtual-environments#available-environments
This upgrade will help Apache Spark 3.1+ preparation for vote and release on the latest OS.
This is tested here.
- https://github.com/dongjoon-hyun/spark/pull/36
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the `Github Action` in this PR.
Closes#30050 from dongjoon-hyun/ubuntu_20.04.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR proposes to skip test reporting ("Report test results") if there are no JUnit XML files are found.
Currently, we're running and skipping the tests dynamically. For example,
- if there are only changes in SparkR at the underlying commit, it only runs the SparkR tests, and skip the other tests and generate JUnit XML files for SparkR test cases.
- if there are only changes in `docs` at the underlying commit, the build skips all tests except linters and do not generate any JUnit XML files.
When test reporting ("Report test results") job is triggered after the main build ("Build and test
") is finished, and there are no JUnit XML files found, it reports the case as a failure. See https://github.com/apache/spark/runs/1196184007 as an example.
This PR works around it by simply skipping the testing report when there are no JUnit XML files are found.
Please see https://github.com/apache/spark/pull/29906#issuecomment-702525542 for more details.
### Why are the changes needed?
To avoid false alarm for test results.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Manually tested in my fork.
Positive case:
https://github.com/HyukjinKwon/spark/runs/1208624679?check_suite_focus=truehttps://github.com/HyukjinKwon/spark/actions/runs/288996327
Negative case:
https://github.com/HyukjinKwon/spark/runs/1208229838?check_suite_focus=truehttps://github.com/HyukjinKwon/spark/actions/runs/289000058Closes#29946 from HyukjinKwon/test-junit-files.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
At SPARK-32493, the R installation was switched to manual installation because setup-r was broken. This seems fixed in the upstream so we should better switch it back.
### Why are the changes needed?
To avoid maintaining the installation steps by ourselve.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
GitHub Actions build in this PR should test it.
Closes#29931 from HyukjinKwon/recover-r-build.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
The PR aims to add Scala 2.13 build test coverage into GitHub Action for Apache Spark 3.1.0.
### Why are the changes needed?
The branch is ready for Scala 2.13 and this will prevent any regression.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass the GitHub Action.
Closes#29793 from dongjoon-hyun/SPARK-32926.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR proposes to explicitly cache and hash the files/directories under 'build' for SBT and Zinc at GitHub Actions. Otherwise, it can end up with overwriting `build` directory. See also https://github.com/apache/spark/pull/29286#issuecomment-679368436
Previously, other files like `build/mvn` and `build/sbt` are also cached and overwritten. So, when you have some changes there, they are ignored.
### Why are the changes needed?
To make GitHub Actions build stable.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
The builds in this PR test it out.
Closes#29536 from HyukjinKwon/SPARK-32695.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to add a `workflow_dispatch` entry in the GitHub Action script (`build_and_test.yml`). This update can enable developers to run the Spark tests for a specific branch on their own local repository, so I think it might help to check if al the tests can pass before opening a new PR.
<img width="944" alt="Screen Shot 2020-08-21 at 16 28 41" src="https://user-images.githubusercontent.com/692303/90866249-96250c80-e3ce-11ea-8496-3dd6683e92ea.png">
### Why are the changes needed?
To reduce the pressure of GitHub Actions on the Spark repository.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manually checked.
Closes#29504 from maropu/DispatchTest.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
This PR renames `master.yml` to `build_and_test.yml` to indicate this is the workflow that builds and runs the tests.
### Why are the changes needed?
Just for readability. `master.yml` looks like the name of the branch (to me).
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
GitHub Actions build in this PR will test it out.
Closes#29459 from HyukjinKwon/minor-rename.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
### What changes were proposed in this pull request?
This PR proposes to remove the usage of my own forks and use the original plugins in GitHub Actions testing report.
SPARK-32357 introduced the GitHub Actions test reporting by leveraging two plugins:
- [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report)
- [dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact)
In order to make it working, it had to fork two repositories with custom fixes:
- HyukjinKwon/action-surefire-reportc96094c
- f86c565d52
The two custom fixes are thankfully merged at https://github.com/ScaCap/action-surefire-report/pull/14 and https://github.com/dawidd6/action-download-artifact/pull/24, and they released new ones to use at [ScaCap/action-surefire-report/commits/v1](https://github.com/ScaCap/action-surefire-report/commits/v1) and [dawidd6/action-download-artifact/commits/v2](https://github.com/dawidd6/action-download-artifact/commits/v2) - thanks jmisur and dawidd6 again.
### Why are the changes needed?
To avoid relying on forks and code duplications.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Logically there is no diff. I tested it at https://github.com/HyukjinKwon/spark/runs/992824229 for doubly sure.
NOTE that this PR cannot be tested here within the workflow triggered by this PR without merging the changes in `test_report.yml` into the master.
Closes#29449 from HyukjinKwon/SPARK-32606-SPARK-32605.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR proposes to report the failed and succeeded tests in GitHub Actions in order to improve the development velocity by leveraging [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report). See the example below:
![Screen Shot 2020-08-13 at 8 17 52 PM](https://user-images.githubusercontent.com/6477701/90128649-28f7f280-dda2-11ea-9211-e98e34332f6b.png)
Note that we cannot just use [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report) in Apache Spark because PRs are from the forked repository, and GitHub secrets are unavailable for the security reason. This plugin and all similar plugins require to have the GitHub token that has the write access in order to post test results but it is unavailable in PRs.
To work around this limitation, I took this approach:
1. In workflow A, run the tests and upload the JUnit XML test results. GitHub provides to upload and download some files.
2. GitHub introduced new event type [`workflow_run`](https://github.blog/2020-08-03-github-actions-improvements-for-fork-and-pull-request-workflows/) 10 days ago. By leveraging this, it triggers another workflow B.
3. Workflow B is in the main repo instead of fork repo, and has the write access the plugin needs. In workflow B, it downloads the artifact uploaded from workflow A (from the forked repository).
4. Workflow B generates the test reports to port from JUnit xml files.
5. Workflow B looks up the PR and posts the test reports.
The `workflow_run` event is very new feature, and looks not so many GitHub Actions plugins support. In order to make this working with [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report), I had to fork two GitHub Actions plugins to use:
- [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report) to have this custom fix: c96094cc35
It added `commit` argument to specify the commit to post the test reports. With `workflow_run`, it can access, in workflow B, to the commit from workflow A.
- [dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact) to have this custom fix: 750b71af35
It added the support of downloading all artifacts from workflow A, in workflow B. By default, it only supports to specify the name of artifact.
Note that I was not able to use the official [actions/download-artifact](https://github.com/actions/download-artifact) because:
- It does not support to download artifacts between different workflows, see also https://github.com/actions/download-artifact/issues/3. Once this issue is resolved, we can switch it back to [actions/download-artifact](https://github.com/actions/download-artifact).
I plan to make a pull request for both repositories so we don't have to rely on forks.
### Why are the changes needed?
Currently, it's difficult to check the failed tests. You should scroll down long logs from GitHub Actions logs.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Manually tested at: https://github.com/HyukjinKwon/spark/pull/17, https://github.com/HyukjinKwon/spark/pull/18, https://github.com/HyukjinKwon/spark/pull/19, https://github.com/HyukjinKwon/spark/pull/20, and master branch of my forked repository.
Closes#29333 from HyukjinKwon/SPARK-32357-fix.
Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
CRAN check fails due to the size of the generated PDF docs as below:
```
...
WARNING
‘qpdf’ is needed for checks on size reduction of PDFs
...
Status: 1 WARNING, 1 NOTE
See
‘/home/runner/work/spark/spark/R/SparkR.Rcheck/00check.log’
for details.
```
This PR proposes to install `qpdf` in GitHub Actions.
Note that I cannot reproduce in my local with the same R version so I am not documenting it for now.
Also, while I am here, I piggyback to install SparkR when the module includes `sparkr`. it is rather a followup of SPARK-32491.
### Why are the changes needed?
To fix SparkR CRAN check failure.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
GitHub Actions will test it out.
Closes#29306 from HyukjinKwon/SPARK-32497.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to manually install R instead of using `setup-r` which seems broken. Currently, GitHub Actions uses its default R 3.4.4 installed, which we dropped as of SPARK-32073.
While I am here, I am also upgrading R version to 4.0. Jenkins will test the old version and GitHub Actions tests the new version. AppVeyor uses R 4.0 but it does not check CRAN which is important when we make a release.
### Why are the changes needed?
To recover GitHub Actions build.
### Does this PR introduce _any_ user-facing change?
No, dev-only
### How was this patch tested?
Manually tested at https://github.com/HyukjinKwon/spark/pull/15Closes#29302 from HyukjinKwon/SPARK-32493.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to recover Java 11 build in `GitHub Action`.
### Why are the changes needed?
This test coverage is removed before. Now, it's time to recover it.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the GitHub Action.
Closes#29295 from dongjoon-hyun/SPARK-32248.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR proposes to redesign the PySpark documentation.
I made a demo site to make it easier to review: https://hyukjin-spark.readthedocs.io/en/stable/reference/index.html.
Here is the initial draft for the final PySpark docs shape: https://hyukjin-spark.readthedocs.io/en/latest/index.html.
In more details, this PR proposes:
1. Use [pydata_sphinx_theme](https://github.com/pandas-dev/pydata-sphinx-theme) theme - [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/) use this theme. The CSS overwrite is ported from Koalas. The colours in the CSS were actually chosen by designers to use in Spark.
2. Use the Sphinx option to separate `source` and `build` directories as the documentation pages will likely grow.
3. Port current API documentation into the new style. It mimics Koalas and pandas to use the theme most effectively.
One disadvantage of this approach is that you should list up APIs or classes; however, I think this isn't a big issue in PySpark since we're being conservative on adding APIs. I also intentionally listed classes only instead of functions in ML and MLlib to make it relatively easier to manage.
### Why are the changes needed?
Often I hear the complaints, from the users, that current PySpark documentation is pretty messy to read - https://spark.apache.org/docs/latest/api/python/index.html compared other projects such as [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/).
It would be nicer if we can make it more organised instead of just listing all classes, methods and attributes to make it easier to navigate.
Also, the documentation has been there from almost the very first version of PySpark. Maybe it's time to update it.
### Does this PR introduce _any_ user-facing change?
Yes, PySpark API documentation will be redesigned.
### How was this patch tested?
Manually tested, and the demo site was made to show.
Closes#29188 from HyukjinKwon/SPARK-32179.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready:
4ad9bfd53b/python/run-tests.py (L161)
This PR includes small related fixes together:
1. Install Python 3.8
2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2.
3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138
4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed.
### Why are the changes needed?
Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also https://github.com/apache/spark/pull/29114
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Was not tested. Github Actions build in this PR will test it out.
Closes#29116 from HyukjinKwon/test-python3.8-togehter.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to drop Python 2.7, 3.4 and 3.5.
Roughly speaking, it removes all the widely known Python 2 compatibility workarounds such as `sys.version` comparison, `__future__`. Also, it removes the Python 2 dedicated codes such as `ArrayConstructor` in Spark.
### Why are the changes needed?
1. Unsupport EOL Python versions
2. Reduce maintenance overhead and remove a bit of legacy codes and hacks for Python 2.
3. PyPy2 has a critical bug that causes a flaky test, SPARK-28358 given my testing and investigation.
4. Users can use Python type hints with Pandas UDFs without thinking about Python version
5. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
### Does this PR introduce _any_ user-facing change?
Yes, users cannot use Python 2.7, 3.4 and 3.5 in the upcoming Spark version.
### How was this patch tested?
Manually tested and also tested in Jenkins.
Closes#28957 from HyukjinKwon/SPARK-32138.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.
In addition, this PR also fixes 3 more issues very closely related together while I am here.
1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.
2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.
3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,
```bash
dev/run-tests.py --modules sql,core
```
which is consistent with `python/run-tests.py`, for example,
```bash
python/run-tests.py --modules pyspark-core,pyspark-ml
```
4. Lastly, also fixed the formatting issue in module specification in the matrix:
```diff
- network_common, network_shuffle, repl, launcher
+ network-common, network-shuffle, repl, launcher,
```
which incorrectly runs build/test the modules.
### Why are the changes needed?
By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Manually tested in my own forked Spark:
https://github.com/HyukjinKwon/spark/pull/7https://github.com/HyukjinKwon/spark/pull/8https://github.com/HyukjinKwon/spark/pull/9https://github.com/HyukjinKwon/spark/pull/10https://github.com/HyukjinKwon/spark/pull/11https://github.com/HyukjinKwon/spark/pull/12Closes#29086 from HyukjinKwon/SPARK-32292.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR reenables GitHub Action on every commit as a next step.
### Why are the changes needed?
We carefully enabled GitHub Action on every PRs, and it looks good so far.
As we saw at https://github.com/apache/spark/pull/29072, GitHub Action is already triggered at every commits on every PRs. Enabling GitHub Action on `master` branch commit doesn't make a big difference. And, we need to start to test at every commit as a next step.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual.
Closes#29076 from dongjoon-hyun/reenable_gha_commit.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to run the Spark tests in Github Actions.
To briefly explain the main idea:
- Reuse `dev/run-tests.py` with SBT build
- Reuse the modules in `dev/sparktestsupport/modules.py` to test each module
- Pass the modules to test into `dev/run-tests.py` directly via `TEST_ONLY_MODULES` environment variable. For example, `pyspark-sql,core,sql,hive`.
- `dev/run-tests.py` _does not_ take the dependent modules into account but solely the specified modules to test.
Another thing to note might be `SlowHiveTest` annotation. Running the tests in Hive modules takes too much so the slow tests are extracted and it runs as a separate job. It was extracted from the actual elapsed time in Jenkins:
![Screen Shot 2020-07-09 at 7 48 13 PM](https://user-images.githubusercontent.com/6477701/87050238-f6098e80-c238-11ea-9c4a-ab505af61381.png)
So, Hive tests are separated into to jobs. One is slow test cases, and the other one is the other test cases.
_Note that_ the current GitHub Actions build virtually copies what the default PR builder on Jenkins does (without other profiles such as JDK 11, Hadoop 2, etc.). The only exception is Kinesis https://github.com/apache/spark/pull/29057/files#diff-04eb107ee163a50b61281ca08f4e4c7bR23
### Why are the changes needed?
Last week and onwards, the Jenkins machines became very unstable for many reasons:
- Apparently, the machines became extremely slow. Almost all tests can't pass.
- One machine (worker 4) started to have the corrupt `.m2` which fails the build.
- Documentation build fails time to time for an unknown reason in Jenkins machine specifically. This is disabled for now at https://github.com/apache/spark/pull/29017.
- Almost all PRs are basically blocked by this instability currently.
The advantages of using Github Actions:
- To avoid depending on few persons who can access to the cluster.
- To reduce the elapsed time in the build - we could split the tests (e.g., SQL, ML, CORE), and run them in parallel so the total build time will significantly reduce.
- To control the environment more flexibly.
- Other contributors can test and propose to fix Github Actions configurations so we can distribute this build management cost.
Note that:
- The current build in Jenkins takes _more than 7 hours_. With Github actions it takes _less than 2 hours_
- We can now control the environments especially for Python easily.
- The test and build look more stable than the Jenkins'.
### Does this PR introduce _any_ user-facing change?
No, dev-only change.
### How was this patch tested?
Tested at https://github.com/HyukjinKwon/spark/pull/4Closes#29057 from HyukjinKwon/migrate-to-github-actions.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR proposes to add a guide to clarify the Spark version when describing "Does this PR introduce any user-facing change?".
### Why are the changes needed?
It seems confusing to write when the user facing changes happen within unreleased branches.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Manually tested in Github and it renders find as intended.
Closes#28403 from HyukjinKwon/minor-more-guide.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to use `r-lib/actions/setup-r` because it's more stable and maintained by 3rd party.
### Why are the changes needed?
This will recover the current outage. In addition, this will be more robust in the future.
As of now, this is tested via https://github.com/dongjoon-hyun/spark/pull/17 .
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the GitHub Actions, especially `Linter R` and `Generate Documents`.
Closes#28382 from dongjoon-hyun/SPARK-31589.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR is a followup of 1b87015044. Now, we automatically label PRs, and seems working fine.
This PR proposes to correct some minor list and categories.
**1.** Move `sbin` from `CORE` into `DEPLOY` components.
```
$ ls sbin
decommission-slave.sh start-all.sh start-slave.sh stop-master.sh stop-thriftserver.sh
slaves.sh start-history-server.sh start-slaves.sh stop-mesos-dispatcher.sh
spark-config.sh start-master.sh start-thriftserver.sh stop-mesos-shuffle-service.sh
spark-daemon.sh start-mesos-dispatcher.sh stop-all.sh stop-slave.sh
spark-daemons.sh start-mesos-shuffle-service.sh stop-history-server.sh stop-slaves.sh
```
**2.**
`/sbin/*mesos*.sh` -> `MESOS`
`/bin/spark-shell*` -> `SPARK SHELL`.
### Why are the changes needed?
To label correctly and dev can take an advantage of it such as checking the PRs of a specific component.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
It was not tested yet. It can be tested after it was merged.
Closes#28201 from HyukjinKwon/SPARK-31330.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR adds some rules that will be used by Probot Auto Labeler to label PRs based on what paths they modify.
### Why are the changes needed?
This should make it easier for committers to organize PRs, and it could also help drive downstream tooling like the PR dashboard.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
We'll only be able to test it, I believe, after merging it in. Given that [the Avro project is using this same bot already](https://github.com/apache/avro/blob/master/.github/autolabeler.yml), I expect it will be straightforward to get this working.
Closes#28114 from nchammas/SPARK-31330-auto-label-prs.
Lead-authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to add a new `GitHub Action` job for document generation.
### Why are the changes needed?
We had better test the document generation in PR Builder.
- https://lists.apache.org/thread.html/rd06a2154e853812652b8f7fa3c003746ed531b213c531517f055e1dc%40%3Cdev.spark.apache.org%3E
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the GitHub Action in this PR.
Closes#27715 from dongjoon-hyun/SPARK-30963.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This is a follow-up of #27577. This pr intends to add a link to the configuration naming guideline in `.github/PULL_REQUEST_TEMPLATE`.
### Why are the changes needed?
For reminding developers to follow the naming rules.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A
Closes#27602 from maropu/pr27577-FOLLOWUP.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to add a fallback Maven repository when a mirror to `central` fail.
### Why are the changes needed?
We use `Google Maven Central` in GitHub Action as a mirror of `central`.
However, `Google Maven Central` sometimes doesn't have newly published artifacts
and there is no guarantee when we get the newly published artifacts.
By duplicating `Maven Central` with a new ID, we can add a fallback Maven repository
which is not mirrored by `Google Maven Central`.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually testing with the new `Twitter` chill artifacts by switching `chill.version` from `0.9.3` to `0.9.5`.
```
$ rm -rf ~/.m2/repository/com/twitter/chill*
$ mvn compile | grep chill
Downloading from google-maven-central: https://maven-central.storage-download.googleapis.com/repos/central/data/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom
Downloading from central_without_mirror: https://repo.maven.apache.org/maven2/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom
Downloaded from central_without_mirror: https://repo.maven.apache.org/maven2/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom (2.8 kB at 11 kB/s)
```
Closes#27281 from dongjoon-hyun/SPARK-30572.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Follow-on to #26877.
### What changes were proposed in this pull request?
This PR tweaks the stale PR message to [clarify](https://github.com/apache/spark/pull/24457#issuecomment-571393900) the procedure for reopening a PR after it has been marked as stale.
### Why are the changes needed?
This change should clarify the reopening process for contributors.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A
Closes#27114 from nchammas/SPARK-30173-stale-tweaks.
Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR adds [a GitHub workflow to automatically close stale PRs](https://github.com/marketplace/actions/close-stale-issues).
### Why are the changes needed?
This will help cut down the number of open but stale PRs and keep the PR queue manageable.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
I'm not sure how to test this PR without impacting real PRs on the repo.
See: https://github.com/actions/stale/issues/32Closes#26877 from nchammas/SPARK-30173-stale-prs.
Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR is a follow-up of https://github.com/apache/spark/pull/26793 and aims to initialize `~/.m2` directory.
### Why are the changes needed?
In case of cache reset, `~/.m2` directory doesn't exist. It causes a failure.
- `master` branch has a cache as of now. So, we missed this.
- `branch-2.4` has no cache as of now, and we hit this failure.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This PR is tested against personal `branch-2.4`.
- https://github.com/dongjoon-hyun/spark/pull/12Closes#26794 from dongjoon-hyun/SPARK-30163-2.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to use [Google Maven mirror](https://cloudplatform.googleblog.com/2015/11/faster-builds-for-Java-developers-with-Maven-Central-mirror.html) in `GitHub Action` jobs to improve the stability.
```xml
<settings>
<mirrors>
<mirror>
<id>google-maven-central</id>
<name>GCS Maven Central mirror</name>
<url>https://maven-central.storage-download.googleapis.com/repos/central/data/</url>
<mirrorOf>central</mirrorOf>
</mirror>
</mirrors>
</settings>
```
### Why are the changes needed?
Although we added Maven cache inside `GitHub Action`, the timeouts happen too frequently during access `artifact descriptor`.
```
[ERROR] Failed to execute goal on project spark-mllib_2.12:
...
Failed to read artifact descriptor for ...
...
Connection timed out (Read failed) -> [Help 1]
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This PR is irrelevant to Jenkins.
This is tested on the personal repository first. `GitHub Action` of this PR should pass.
- https://github.com/dongjoon-hyun/spark/pull/11Closes#26793 from dongjoon-hyun/SPARK-30163.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR enables JDK11 build with `hadoop-2.7` profile at `GitHub Action`.
**BEFORE (6 jobs including one JDK11 job)**
![before](https://user-images.githubusercontent.com/9700541/70342731-7763f300-180a-11ea-859f-69038b88451f.png)
**AFTER (7 jobs including two JDK11 jobs)**
![after](https://user-images.githubusercontent.com/9700541/70342658-54d1da00-180a-11ea-9fba-507fc087dc62.png)
### Why are the changes needed?
SPARK-29957 makes JDK11 test work with `hadoop-2.7` profile. We need to protect it.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This is `GitHub Action` only PR. See the result of `GitHub Action` on this PR.
Closes#26782 from dongjoon-hyun/SPARK-GHA-HADOOP-2.7.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR adds `GitHub Action Cache` task on `build` directory.
### Why are the changes needed?
This will replace the Maven downloading with the cache.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually check the GitHub Action log of this PR.
Closes#26652 from dongjoon-hyun/SPARK-MAVEN-CACHE.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims the followings.
- Add two profiles, `hive-1.2` and `hive-2.3` (default)
- Validate if we keep the existing combination at least. (Hadoop-2.7 + Hive 1.2 / Hadoop-3.2 + Hive 2.3).
For now, we assumes that `hive-1.2` is explicitly used with `hadoop-2.7` and `hive-2.3` with `hadoop-3.2`. The followings are beyond the scope of this PR.
- SPARK-29988 Adjust Jenkins jobs for `hive-1.2/2.3` combination
- SPARK-29989 Update release-script for `hive-1.2/2.3` combination
- SPARK-29991 Support `hive-1.2/2.3` in PR Builder
### Why are the changes needed?
This will help to switch our dependencies to update the exposed dependencies.
### Does this PR introduce any user-facing change?
This is a dev-only change that the build profile combinations are changed.
- `-Phadoop-2.7` => `-Phadoop-2.7 -Phive-1.2`
- `-Phadoop-3.2` => `-Phadoop-3.2 -Phive-2.3`
### How was this patch tested?
Pass the Jenkins with the dependency check and tests to make it sure we don't change anything for now.
- [Jenkins (-Phadoop-2.7 -Phive-1.2)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull)
- [Jenkins (-Phadoop-3.2 -Phive-2.3)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull)
Also, from now, GitHub Action validates the following combinations.
![gha](https://user-images.githubusercontent.com/9700541/69355365-822d5e00-0c36-11ea-93f7-e00e5459e1d0.png)
Closes#26619 from dongjoon-hyun/SPARK-29981.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to cache `~/.m2/repository/net` and `~/.m2/repository/io` to reduce the flakiness.
### Why are the changes needed?
This will stabilize GitHub Action more before adding `hive-1.2` and `hive-2.3` combination.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
After the GitHub Action on this PR passes, check the log.
Closes#26621 from dongjoon-hyun/SPARK-GHA-CACHE.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Linter (R) github workflows failed sometimes like:
https://github.com/apache/spark/pull/26509/checks?check_run_id=310718016
Failed message:
```
Executing: /tmp/apt-key-gpghome.8r74rQNEjj/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
gpg: connecting dirmngr at '/tmp/apt-key-gpghome.8r74rQNEjj/S.dirmngr' failed: IPC connect call failed
gpg: keyserver receive failed: No dirmngr
##[error]Process completed with exit code 2.
```
It is due to a buggy GnuPG. Context:
https://github.com/sbt/website/pull/825https://github.com/sbt/sbt/issues/4261https://github.com/microsoft/WSL/issues/3286
### Why are the changes needed?
Make lint-r github workflows work.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Pass github workflows.
Closes#26602 from viirya/SPARK-29964.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR fixes SparkR lint errors and adds `lint-r` GitHub Action to protect the branch.
### Why are the changes needed?
It turns out that we currently don't run it. It's recovered yesterday. However, after that, our Jenkins linter jobs (`master`/`branch-2.4`) has been broken on `lint-r` tasks.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the GitHub Action on this PR in addition to Jenkins R and AppVeyor R.
Closes#26564 from dongjoon-hyun/SPARK-29936.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to enable [GitHub Action Cache on Maven local repository](https://github.com/actions/cache/blob/master/examples.md#java---maven) for the following goals.
1. To reduce the chance of failure due to the Maven download flakiness.
2. To speed up the build a little bit.
Unfortunately, due to the GitHub Action Cache limitation, it seems that we cannot put all into a single cache. It's ignored like the following.
- **.m2/repository is 680777194 bytes**
```
/bin/tar -cz -f /home/runner/work/_temp/01f162c3-0c78-4772-b3de-b619bb5d7721/cache.tgz -C /home/runner/.m2/repository .
3
##[warning]Cache size of 680777194 bytes is over the 400MB limit, not saving cache.
```
### Why are the changes needed?
Not only for the speed up, but also for reducing the Maven download flakiness, we had better enable caching on local maven repository. The followings are the failure examples in these days.
- https://github.com/apache/spark/runs/295869450
```
[ERROR] Failed to execute goal on project spark-streaming-kafka-0-10_2.12:
Could not resolve dependencies for project org.apache.spark:spark-streaming-kafka-0-10_2.12🫙spark-367716:
Could not transfer artifact com.fasterxml.jackson.datatype:jackson-datatype-jdk8:jar:2.10.0
from/to central (https://repo.maven.apache.org/maven2):
Failed to transfer file https://repo.maven.apache.org/maven2/com/fasterxml/jackson/datatype/
jackson-datatype-jdk8/2.10.0/jackson-datatype-jdk8-2.10.0.jar with status code 503 -> [Help 1]
...
[ERROR] mvn <args> -rf :spark-streaming-kafka-0-10_2.12
```
```
[ERROR] Failed to execute goal on project spark-tools_2.12:
Could not resolve dependencies for project org.apache.spark:spark-tools_2.12🫙3.0.0-SNAPSHOT:
Failed to collect dependencies at org.clapper:classutil_2.12🫙1.5.1:
Failed to read artifact descriptor for org.clapper:classutil_2.12🫙1.5.1:
Could not transfer artifact org.clapper:classutil_2.12:pom:1.5.1 from/to central (https://repo.maven.apache.org/maven2):
Connection timed out (Read failed) -> [Help 1]
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually check the GitHub Action log of this PR.
```
Cache restored from key: 1.8-hadoop-2.7-maven-com-5b4a9fb13c5f5ff78e65a20003a3810796e4d1fde5f24d397dfe6e5153960ce4
Cache restored from key: 1.8-hadoop-2.7-maven-org-5b4a9fb13c5f5ff78e65a20003a3810796e4d1fde5f24d397dfe6e5153960ce4
```
Closes#26456 from dongjoon-hyun/SPARK-29820.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR change the Github Actions build command from `mvn package` to `mvn install` to build Scaladoc jars.
### Why are the changes needed?
Sometimes `mvn install` build failure with error: `not found: type ClassName...`.
More details: https://github.com/apache/spark/pull/24628#issuecomment-495655747
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A
Closes#26414 from wangyum/github-action-install.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to add linters and license/dependency checkers to GitHub Action. This excludes `lint-r` intentionally because https://github.com/actions/setup-r is not ready. We can add that later when it becomes available.
### Why are the changes needed?
This will help the PR reviews.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
See the GitHub Action result on this PR.
Closes#25879 from dongjoon-hyun/SPARK-29199.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR use `java-version` instead of `version` for GitHub Action. More details:
204b974cf4ac25aeee3a
### Why are the changes needed?
The `version` property will not be supported after October 1, 2019.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
N/A
Closes#25866 from wangyum/java-version.
Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to increase the JVM CodeCacheSize from 0.5G to 1G.
### Why are the changes needed?
After upgrading to `Scala 2.12.10`, the following is observed during building.
```
2019-09-18T20:49:23.5030586Z OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
2019-09-18T20:49:23.5032920Z OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
2019-09-18T20:49:23.5034959Z CodeCache: size=524288Kb used=521399Kb max_used=521423Kb free=2888Kb
2019-09-18T20:49:23.5035472Z bounds [0x00007fa62c000000, 0x00007fa64c000000, 0x00007fa64c000000]
2019-09-18T20:49:23.5035781Z total_blobs=156549 nmethods=155863 adapters=592
2019-09-18T20:49:23.5036090Z compilation: disabled (not enough contiguous free space left)
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually check the Jenkins or GitHub Action build log (which should not have the above).
Closes#25836 from dongjoon-hyun/SPARK-CODE-CACHE-1G.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
Until now, we are testing JDK8/11 with Hadoop-3.2. This PR aims to extend the test coverage for JDK8/Hadoop-2.7.
### Why are the changes needed?
This will prevent Hadoop 2.7 compile/package issues at PR stage.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
GitHub Action on this PR shows all three combinations now. And, this is irrelevant to Jenkins test.
Closes#25824 from dongjoon-hyun/SPARK-29125.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR enables GitHub Action on PRs.
### Why are the changes needed?
So far, we detect JDK11 compilation error after merging.
This PR aims to prevent JDK11 compilation error at PR stage.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manual. See the GitHub Action on this PR.
Closes#25786 from dongjoon-hyun/SPARK-29079.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
This patch adds java version parameter to GitHub workflow conf for JDK8/11.
As we want to build JDK8/11 on GitHub workflow, we might need to add java version according current matrix.
No
See the GitHub workflow run result.
Closes#25625 from viirya/github-workflow-java.
Authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This PR aims to add `-Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-3.2 -Phadoop-cloud` profiles to GitHub workflow conf.
### Why are the changes needed?
Currently, we build with JDK8 and test with JDK8/11 in Jenkins.
And, we use GitHub Workflow for JDK8/JDK11 building test.
To test JDK11 fully, we need to enable `hive` and `hadoop-3.2` profiles for `Hive 2.3.6` and `Hadoop 3.2`. Also, this PR adds all resource manager modules, too.
### Does this PR introduce any user-facing change?
No. In addition, Jenkins workload will be the same because this is specific to GitHub workflow.
### How was this patch tested?
See the GitHub workflow run result.
Closes#25624 from dongjoon-hyun/SPARK-JDK11-HIVE.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
4. Be sure to keep the PR description updated to reflect all changes.
5. Please write your PR title to summarize what this PR proposes.
6. If possible, provide a concise example to reproduce the issue for a faster review.
-->
### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
2. If you fix some SQL features, you can provide some references of other DBMSes.
3. If there is design documentation, please add the link.
4. If there is a discussion in the mailing list, please add the link.
-->
This PR proposes to improve the Github template for better and faster review iterations and better interactions between PR authors and reviewers.
As suggested in the the [dev mailing list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-New-sections-in-Github-Pull-Request-description-template-td27527.html), this PR referred [Kubernates' PR template](https://raw.githubusercontent.com/kubernetes/kubernetes/master/.github/PULL_REQUEST_TEMPLATE.md).
Therefore, those fields are newly added:
```
### Why are the changes needed?
### Does this PR introduce any user-facing change?
```
and some comments were added.
### Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
-->
Currently, many PR descriptions are poorly formatted, which causes some overheads between PR authors and reviewers.
There are multiple problems by those poorly formatted PR descriptions:
- Some PRs still write single line in PR description with 500+- code changes in a critical path.
- Some PRs do not describe behaviour changes and reviewers need to find and document.
- Some PRs are hard to review without outlines but they are not mentioned sometimes.
- Spark is being old and sometimes we need to track the history deep. Due to poorly formatted PR description, sometimes it requires to read whole codes of whole commit histories to find the root cause of a bug.
- Reviews take a while but the number of PR still grows.
This PR targets to alleviate the problems and situation.
### Does this PR introduce any user-facing change?
<!--
If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
If no, write 'No'.
-->
Yes, it changes the PR templates when PRs are open. This PR uses the template this PR proposes.
### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why it was difficult to add.
-->
Manually tested via Github preview feature.
Closes#25310 from HyukjinKwon/SPARK-28578.
Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Make Github Actions log quieter
Closes#25468 from dbtsai/actions2.
Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
## What changes were proposed in this pull request?
Add JDK11 for Github Actions
Closes#25444 from dbtsai/jdk11.
Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
## What changes were proposed in this pull request?
Github now provides free CI/CD for build, test, and deploy. This PR enables a simple Github Actions to build master with JDK8 with latest Ubuntu. We can extend it with different versions of JDK, and even build Spark with docker images in the future.
Closes#25440 from dbtsai/actions.
Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
## What changes were proposed in this pull request?
Tighten up some key links to the project and download pages to use HTTPS
## How was this patch tested?
N/A
Closes#24665 from srowen/HTTPSURLs.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request?
Updates links to the wiki to links to the new location of content on spark.apache.org.
## How was this patch tested?
Doc builds
Author: Sean Owen <sowen@cloudera.com>
Closes#15967 from srowen/SPARK-18073.1.
## What changes were proposed in this pull request?
Link to contributing wiki in PR template, README.md
## How was this patch tested?
Doc-only change, tested by Jekyll
Author: Sean Owen <sowen@cloudera.com>
Closes#15429 from srowen/SPARK-17840.
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was the this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Author: Rahul Tanwani <tanwanirahul@gmail.com>
Closes#11343 from tanwanirahul/pull_request_template.