ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
hyukjinkwon	ff493173ab	[SPARK-34065][INFRA] Cancel the duplicated jobs only in PRs at GitHub Actions ### What changes were proposed in this pull request? This is kind of a followup of https://github.com/apache/spark/pull/31104 but I decided to track it separately with a separate JIRA. Currently the jobs are being canceled in main repo branches. If a commit is merged, for example, to master branch before the test finishes, it cancels the previous builds. This is a problem because we cannot, for example, detect logical conflict properly. We should only cancel the jobs in PRs: ![Screen Shot 2021-01-11 at 3 22 24 PM](https://user-images.githubusercontent.com/6477701/104152015-c7f04b80-5421-11eb-9e40-6b0a0e5b8442.png) This PR proposes to don't do this in the main repo branch commits but only do it in PRs. ### Why are the changes needed? - To keep the test coverage - To run the test in the synced master branch instead of relying on the builds made in each PR with an outdated master branch - To detect test failures from logical conflicts from merging two conflicting PRs at the same time. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? I manually tested in - https://github.com/HyukjinKwon/spark/pull/27 - https://github.com/HyukjinKwon/spark/pull/28 I added Yi Wu as a co-author since he helped verifying the current fix in the PR above. I checked that it does not cancel in the main repo branch: ![Screen Shot 2021-01-11 at 3 58 52 PM](https://user-images.githubusercontent.com/6477701/104153656-3afbc100-5426-11eb-9309-85f6f4fd9ff3.png) I checked it cancels in PRs: ![Screen Shot 2021-01-11 at 3 58 45 PM](https://user-images.githubusercontent.com/6477701/104153658-3d5e1b00-5426-11eb-89f7-786c3ae6849a.png) Closes #31121 from HyukjinKwon/SPARK-34065. Lead-authored-by: hyukjinkwon <gurwls223@apache.org> Co-authored-by: yi.wu <yi.wu@databricks.com> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2021-01-11 16:37:16 +09:00
Kamil Breguła	3e5e08640e	[SPARK-34053][INFRA] Cancel the previous build Similar to: https://github.com/apache/spark/pull/31098 https://github.com/apache/calcite/pull/2318 (solution suggestted by vlsi - https://github.com/apache/pulsar/issues/9154#issuecomment-756984731) I used the action, which was maintained by potiuk instead of the original author, for two reasons: - the original action was abandoned and is not supported (Proof: https://github.com/n1hility/cancel-previous-runs/issues/7) - this action works with forks. The original action only worked when the contribution was run in the same repository and the action had a token with full accesses. > If you use forks, you should create a separate "Cancelling" workflow_run triggered workflow. The workflow_run should be responsible for all canceling actions. The examples below show the possible ways the action can be utilized. ### What changes were proposed in this pull request? This PR aims to reduce the GitHub Action usage by cancelling the previous build. ### Why are the changes needed? In most case, the last commit is meaningful. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Due to the nature of the change, testing of this change is difficult. > Note: This event will only trigger a workflow run if the workflow file is on the default branch. https://docs.github.com/en/free-pro-teamlatest/actions/reference/events-that-trigger-workflows#workflow_run However, you can see on my fork that this action is triggered. https://github.com/mik-laj/spark/actions?query=workflow%3A%22Cancelling+Duplicates%22 I also asked the author of this action to review this change - potiuk (PMC of Apache Airflow) and I have a positive review. Closes #31104 from mik-laj/patch-1. Lead-authored-by: Kamil Breguła <kamil.bregula@polidea.com> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-01-10 16:19:44 -08:00
Dongjoon Hyun	2627825647	[SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job ### What changes were proposed in this pull request? This PR aims to recover GitHub Action `build_and_test` job. ### Why are the changes needed? Currently, `build_and_test` job fails to start because of the following in master/branch-3.1 at least. ``` r-lib/actions/setup-rv1 is not allowed to be used in apache/spark. Actions in this workflow must be: created by GitHub, verified in the GitHub Marketplace, within a repository owned by apache or match the following: adoptopenjdk/, apache/, gradle/wrapper-validation-action. ``` - https://github.com/apache/spark/actions/runs/449826457 ![Screen Shot 2020-12-28 at 10 06 11 PM](https://user-images.githubusercontent.com/9700541/103262174-f1f13a80-4958-11eb-8ceb-631527155775.png) ### Does this PR introduce _any_ user-facing change? No. This is a test infra. ### How was this patch tested? To check GitHub Action `build_and_test` job on this PR. Closes #30959 from dongjoon-hyun/SPARK-33931. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-29 20:51:57 +09:00
Enrico Minack	1d450250eb	[BUILD][MINOR] Do not publish snapshots from forks ### What changes were proposed in this pull request? The GitHub workflow `Publish Snapshot` publishes master and 3.1 branch via Nexus. For this, the workflow uses `secrets.NEXUS_USER` and `secrets.NEXUS_PW` secrets. These are not available in forks where this workflow fails every day: - https://github.com/G-Research/spark/actions/runs/431626797 - https://github.com/G-Research/spark/actions/runs/433153049 - https://github.com/G-Research/spark/actions/runs/434680048 - https://github.com/G-Research/spark/actions/runs/436958780 ### Why are the changes needed? Avoid attempting to publish snapshots from forked repositories. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Code review only. Closes #30884 from EnricoMi/branch-do-not-publish-snapshots-from-forks. Authored-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-23 00:22:42 +09:00
Kousuke Saruta	b0da2bcd46	[MINOR][INFRA] Add -Pspark-ganglia-lgpl to the build definition with Scala 2.13 on GitHub Actions ### What changes were proposed in this pull request? This PR adds `-Pspark-ganglia-lgpl` to the build definition with Scala 2.13 on GitHub Actions. ### Why are the changes needed? Keep the code build-able with Scala 2.13. With this change, all the sub-modules seems to be built-able with Scala 2.13. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I confirmed Scala 2.13 build pass with the following command. ``` $ ./dev/change-scala-version.sh 2.13 $ build/sbt -Pspark-ganglia-lgpl -Pscala-2.13 compile test:compile ``` Closes #30834 from sarutak/ganglia-scala-2.13. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-18 15:10:13 +09:00
Kousuke Saruta	b135db3b1a	[SPARK-33757][INFRA][R][FOLLOWUP] Provide more simple solution ### What changes were proposed in this pull request? This PR proposes a better solution for the R build failure on GitHub Actions. The issue is solved in #30737 but I noticed the following two things. * We can use the latest `usethis` if we install additional libraries on the GitHub Actions environment. * For tests on AppVeyor, `usethis` is not necessary, so I partially revert the previous change. ### Why are the changes needed? For more simple solution. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Confirmed on GitHub Actions and AppVeyor on my account. Closes #30753 from sarutak/followup-SPARK-33757. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-13 17:27:39 -08:00
Kousuke Saruta	fb2e3af4b5	[SPARK-33757][INFRA][R] Fix the R dependencies build error on GitHub Actions and AppVeyor ### What changes were proposed in this pull request? This PR fixes the R dependencies build error on GitHub Actions and AppVeyor. The reason seems that `usethis` package is updated 2020/12/10. https://cran.r-project.org/web/packages/usethis/index.html ### Why are the changes needed? To keep the build clean. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Should be done by GitHub Actions. Closes #30737 from sarutak/fix-r-dependencies-build-error. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-12 00:54:40 +09:00
Kousuke Saruta	29cc5b3f23	[MINOR][INFRA] Add kubernetes-integration-tests to GitHub Actions for Scala 2.13 build ### What changes were proposed in this pull request? This PR adds `kubernetes-integration-tests` to GitHub Actions for Scala 2.13 build. ### Why are the changes needed? Now that the build pass with `kubernetes-integration-tests` and Scala 2.13, it's better to keep it build-able. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Should be done by GitHub Actions. I also confirmed that the build passes with the following command. ``` $ build/sbt -Pscala-2.13 -Pkubernetes -Pkubernetes-integration-tests compile test:compile ``` Closes #30731 from sarutak/github-actions-k8s. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-12 00:53:31 +09:00
Dongjoon Hyun	c001dd49e4	[SPARK-33675][INFRA][FOLLOWUP] Schedule branch-3.1 snapshot at master branch ### What changes were proposed in this pull request? Currently, `master`/`branch-3.0`/`branch-2.4` snapshot publishing is successfully migrated from Jenkins to `GitHub Action`. - https://github.com/apache/spark/actions?query=workflow%3A%22Publish+Snapshot%22 This PR aims to schedule `branch-3.1` snapshot at `master` branch. ### Why are the changes needed? This is because it turns out that `GitHub Action Schedule` works only at `master` branch. (the default branch). - https://docs.github.com/en/free-pro-teamlatest/actions/reference/events-that-trigger-workflows#scheduled-events ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The matrix triggering is tested at the forked branch. - https://github.com/dongjoon-hyun/spark/runs/1519015974 Closes #30674 from dongjoon-hyun/SPARK-SCHEDULE-3.1. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-08 10:43:41 -08:00
Dongjoon Hyun	3a6546d385	[MINOR][INFRA] Add -Pdocker-integration-tests to GitHub Action Scala 2.13 build job ### What changes were proposed in this pull request? This aims to add `-Pdocker-integration-tests` at GitHub Action job for Scala 2.13 compilation. ### Why are the changes needed? We fixed Scala 2.13 compilation of this module at https://github.com/apache/spark/pull/30660 . This PR will prevent accidental regression at that module. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GitHub Action Scala 2.13 job. Closes #30661 from dongjoon-hyun/SPARK-DOCKER-IT. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>	2020-12-08 14:11:39 +09:00
Kousuke Saruta	e88f0d4a24	[SPARK-33683][INFRA] Remove -Djava.version=11 from Scala 2.13 build in GitHub Actions ### What changes were proposed in this pull request? This PR removes `-Djava.version=11` from the build command for Scala 2.13 in the GitHub Actions' job. In the GitHub Actions' job, the build command for Scala 2.13 is defined as follows. ``` ./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Djava.version=11 -Pscala-2.13 compile test:compile ``` Though, Scala 2.13 build uses Java 8 rather than 11 so let's remove `-Djava.version=11`. ### Why are the changes needed? To build with consistent configuration. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Should be done by GitHub Actions' workflow. Closes #30633 from sarutak/scala-213-java11. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-12-06 17:57:19 -08:00
Dongjoon Hyun	e32de29bce	[SPARK-33675][INFRA] Add GitHub Action job to publish snapshot ### What changes were proposed in this pull request? This PR aims to add `GitHub Action` job to publish daily snapshot for master branch. - https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core_2.12/3.2.0-SNAPSHOT/ For the other branches, I'll make adjusted backports. - For `branch-3.1`, we can specify the checkout `ref` to `branch-3.1`. - For `branch-2.4` and `branch-3.0`, we can publish at every commit since the traffic is low. - https://github.com/apache/spark/pull/30630 (branch-3.0) - https://github.com/apache/spark/pull/30629 (branch-2.4 LTS) ### Why are the changes needed? After this series of jobs, this will reduce our maintenance burden permanently from AmpLab Jenkins by removing the following completely. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/ For now, AmpLab Jenkins doesn't have a job for `branch-3.1`. We can do it by ourselves by `GitHub Action`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The snapshot publishing is tested here at PR trigger. Since this PR adds a scheduled job, we cannot test in this PR. - https://github.com/dongjoon-hyun/spark/runs/1505792859 Apache Infra team finished the setup here. - https://issues.apache.org/jira/browse/INFRA-21167 Closes #30623 from dongjoon-hyun/SPARK-33675. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-07 10:05:28 +09:00
Dongjoon Hyun	f94cb53a90	[MINOR][INFRA] Use the latest image for GitHub Action jobs ### What changes were proposed in this pull request? Currently, GitHub Action is using two docker images. ``` $ git grep dongjoon/apache-spark-github-action-image .github/workflows/build_and_test.yml: image: dongjoon/apache-spark-github-action-image:20201015 .github/workflows/build_and_test.yml: image: dongjoon/apache-spark-github-action-image:20201025 ``` This PR aims to make it consistent by using the latest one. ``` - image: dongjoon/apache-spark-github-action-image:20201015 + image: dongjoon/apache-spark-github-action-image:20201025 ``` ### Why are the changes needed? This is for better maintainability. The image size is almost the same. ``` $ docker images \| grep 202010 dongjoon/apache-spark-github-action-image 20201025 37adfa3d226a 5 weeks ago 2.18GB dongjoon/apache-spark-github-action-image 20201015 ff6fee8dc36d 6 weeks ago 2.16GB ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action. Closes #30578 from dongjoon-hyun/SPARK-MINOR. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-12-03 09:34:42 +09:00
HyukjinKwon	fbfc0bf628	[SPARK-33464][INFRA] Add/remove (un)necessary cache and restructure GitHub Actions yaml ### What changes were proposed in this pull request? This PR proposes: - Add `~/.sbt` directory into the build cache, see also https://github.com/sbt/sbt/issues/3681 - Move `hadoop-2` below to put up together with `java-11` and `scala-213`, see https://github.com/apache/spark/pull/30391#discussion_r524881430 - Remove unnecessary `.m2` cache if you run SBT tests only. - Remove `rm ~/.m2/repository/org/apache/spark`. If you don't `sbt publishLocal` or `mvn install`, we don't need to care about it. - Use Java 8 in Scala 2.13 build. We can switch the Java version to 11 used for release later. - Add caches into linters. The linter scripts uses `sbt` in, for example, `./dev/lint-scala`, and uses `mvn` in, for example, `./dev/lint-java`. Also, it requires to `sbt package` in Jekyll build, see: https://github.com/apache/spark/blob/master/docs/_plugins/copy_api_dirs.rb#L160-L161. We need full caches here for SBT, Maven and build tools. - Use the same syntax of Java version, 1.8 -> 8. ### Why are the changes needed? - Remove unnecessary stuff - Cache what we can in the build ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It will be tested in GitHub Actions build at the current PR Closes #30391 from HyukjinKwon/SPARK-33464. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-18 15:13:43 -08:00
Dongjoon Hyun	10105b555d	[SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2 ### What changes were proposed in this pull request? This PR aims to protect `Hadoop 2.x` profile compilation in Apache Spark 3.1+. ### Why are the changes needed? Since Apache Spark 3.1+ switch our default profile to Hadoop 3, we had better prevent at least compilation error with `Hadoop 2.x` profile at the PR review phase. Although this is an additional workload, it will finish quickly because it's compilation only. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action. - This should be merged after https://github.com/apache/spark/pull/30375 . Closes #30378 from dongjoon-hyun/SPARK-33454. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-11-16 15:06:51 +09:00
Dongjoon Hyun	a70a2b02ce	[SPARK-33439][INFRA] Use SERIAL_SBT_TESTS=1 for SQL modules ### What changes were proposed in this pull request? This PR aims to decrease the parallelism of `SQL` module like `Hive` module. ### Why are the changes needed? GitHub Action `sql - slow tests` become flaky. - https://github.com/apache/spark/runs/1393670291 - https://github.com/apache/spark/runs/1393088031 ### Does this PR introduce _any_ user-facing change? No. This is dev-only feature. Although this will increase the running time, but it's better than flakiness. ### How was this patch tested? Pass the GitHub Action stably. Closes #30365 from dongjoon-hyun/SPARK-33439. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-11-12 21:19:51 -08:00
Kousuke Saruta	208b94e4c1	[SPARK-33353][BUILD] Cache dependencies for Coursier with new sbt in GitHub Actions ### What changes were proposed in this pull request? This PR change the behavior of GitHub Actions job that caches dependencies. SPARK-33226 upgraded sbt to 1.4.1. As of 1.3.0, sbt uses Coursier as the dependency resolver / fetcher. So let's change the dependency cache configuration for the GitHub Actions job. ### Why are the changes needed? To make build faster with Coursier for the GitHub Actions job. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Should be done by GitHub Actions itself. Closes #30259 from sarutak/coursier-cache. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-11-05 09:29:53 -08:00
Kyle Bendickson	0535b34ad4	[SPARK-33282] Migrate from deprecated probot autolabeler to GitHub labeler action ### What changes were proposed in this pull request? This PR removes the old Probot Autolabeler labeling configuration, as the probot autolabeler has been deprecated. I've updated the configs in Iceberg and in Avro, and we also need to update here. This PR adds in an additional workflow for labeling PRs and migrates the old probot config to the new format. Unfortunately, because certain features have not been released upstream, we will not get the _exact_ behavior as before. I have documented where that is and what changes are neeeded, and in the associated ticket I've also discussed other options and why I think this is the best way to go. Definitely a follow up ticket is needed to get the original behavior back in these few cases, but PRs have not been labeled for almost a month and so it's probably best to get it right 95% of the time and occasionally have some UI related PRs labeled as `CORE` while the issue is resolved upstream and/or further investigated. ### Why are the changes needed? The probot autolabeler is dead and will not be maintained going forward. This has been confirmed with github user [at]mithro in an issue in their repository. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? To test this PR, I first merged the config into my local fork. I then edited it several times and ran tests on that. Unfortunately, I've overwritten my fork with the apache repo in order to create a proper PR. However, I've also added the config for the same thing in the Iceberg repo as well as the Avro repo. I have now merged this PR into my local repo and will be running some tests on edge cases there and for validating in general: - [Check that the SQL label is applied for changes directly below repo root's sql directory](https://github.com/kbendick/spark/pull/16) ✅ - [Check that the structured streaming label is applied](https://github.com/kbendick/spark/pull/20) ✅ - [Check that a wildcard at the end of a pattern will match nested files](https://github.com/kbendick/spark/pull/19) ✅ - [Check that the rule */pom.xml will match the root pom.xml file](https://github.com/kbendick/spark/pull/25) ✅ I've also discovered that we're likely not killing github actions that run (like large tests etc) when users push to their PR. In most cases, I see that a user has to mark something as "OK to test", but it still seems like we might want to discuss whether or not we should add a cancellation step In order to save time / capacity on the runners. If so desired, we would add an action in each workflow that cancels old runs when a `push` action occurs on a PR. This will likely make waiting for test runners much faster iff tests are automatically rerun on push by anybody (such as PMCs, PRs that have been marked OK to test, etc). We could free a large number of resources potentially if a cancellation step was added to all of the workflows in the Apache account (as github action API limits are set at the account level). Admittedly, the fact that the "old" workflow runs weren't cancelled could admittedly be because of the fact that I was working in a fork, but given that there are explicit actions to be added to the start of workflows to cancel old PR workflows and given that we don't have them configured indicates to me that likely this is the case in this repo (and in most `apache` repos as well), at least under certain circumstances (e.g. repos that don't have "Ok to test"-like webhooks as one example). This is a separate issue though, which I can bring up on the mailing list once I'm done with this PR. Unfortunately I've been very busy the past two weeks, but if somebody else wanted to work on that I would be happy to support with any knowledge I have. The last Apache repo to still have the probot autolabeler in it is Beam, at which point we can have Gavin from ASF Infra remove the permissions for the probot autolabeler entirely. See the associated JIRA ticket for the links to other tickets, like the one for ASF Infra to remove the dead probot autolabeler's read and write permissions to our PRs in the Apache organization. Closes #30244 from kbendick/begin-migration-to-github-labeler-action. Authored-by: Kyle Bendickson <kjbendickson@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-11-05 16:10:52 +09:00
HyukjinKwon	9818f079aa	[SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency ### What changes were proposed in this pull request? This PR proposes to initiate the migration to NumPy documentation style (from reST style) in PySpark docstrings. This PR also adds one migration example of `SparkContext`. - Before: ... ![Screen Shot 2020-10-26 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/97161090-a8ea0200-17c0-11eb-8204-0e70d18fc571.png) ... ![Screen Shot 2020-10-26 at 7 02 09 PM](https://user-images.githubusercontent.com/6477701/97161100-aab3c580-17c0-11eb-92ad-f5ad4441ce16.png) ... - After: ... ![Screen Shot 2020-10-26 at 7 24 08 PM](https://user-images.githubusercontent.com/6477701/97161219-d636b000-17c0-11eb-80ab-d17a570ecb4b.png) ... See also https://numpydoc.readthedocs.io/en/latest/format.html ### Why are the changes needed? There are many reasons for switching to NumPy documentation style. 1. Arguably reST style doesn't fit well when the docstring grows large because it provides (arguably) less structures and syntax. 2. NumPy documentation style provides a better human readable docstring format. For example, notebook users often just do `help(...)` by `pydoc`. 3. NumPy documentation style is pretty commonly used in data science libraries, for example, pandas, numpy, Dask, Koalas, matplotlib, ... Using NumPy documentation style can give users a consistent documentation style. ### Does this PR introduce _any_ user-facing change? The dependency itself doesn't change anything user-facing. The documentation change in `SparkContext` does, as shown above. ### How was this patch tested? Manually tested via running `cd python` and `make clean html`. Closes #30149 from HyukjinKwon/SPARK-33243. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-27 14:03:57 +09:00
Dongjoon Hyun	850adeb0fd	[SPARK-33239][INFRA] Use pre-built image at GitHub Action SparkR job ### What changes were proposed in this pull request? This PR aims to use a pre-built image for Github Action SparkR job. ### Why are the changes needed? This will reduce the execution time and the flakiness. BEFORE (21 minutes 39 seconds) ![Screen Shot 2020-10-16 at 1 24 43 PM](https://user-images.githubusercontent.com/9700541/96305593-fbeada80-0fb2-11eb-9b8e-86d8abaad9ef.png) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action `sparkr` job in this PR. Closes #30066 from dongjoon-hyun/SPARKR. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-26 01:50:23 -07:00
Bryan Cutler	47a6568265	[SPARK-33189][PYTHON][TESTS] Add env var to tests for legacy nested timestamps in pyarrow ### What changes were proposed in this pull request? Add an environment variable `PYARROW_IGNORE_TIMEZONE` to pyspark tests in run-tests.py to use legacy nested timestamp behavior. This means that when converting arrow to pandas, nested timestamps with timezones will have the timezone localized during conversion. ### Why are the changes needed? The default behavior was changed in PyArrow 2.0.0 to propagate timezone information. Using the environment variable enables testing with newer versions of pyarrow until the issue can be fixed in SPARK-32285. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes #30111 from BryanCutler/arrow-enable-legacy-nested-timestamps-SPARK-33189. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-21 09:13:33 +09:00
HyukjinKwon	eb9966b700	[SPARK-33190][INFRA][TESTS] Set upper bound of PyArrow version in GitHub Actions ### What changes were proposed in this pull request? PyArrow is uploaded into PyPI today (https://pypi.org/project/pyarrow/), and some tests fail with PyArrow 2.0.0+: ``` ====================================================================== ERROR [0.774s]: test_grouped_over_window_with_key (pyspark.sql.tests.test_pandas_grouped_map.GroupedMapInPandasTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 595, in test_grouped_over_window_with_key .select('id', 'result').collect() File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 588, in collect sock_info = self._jdf.collectToPython() File "/__w/spark/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/__w/spark/spark/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.PythonException: An exception was thrown from the Python worker. Please see the stack trace below. Traceback (most recent call last): File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 601, in main process() File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 593, in process serializer.dump_stream(out_iter, outfile) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 255, in dump_stream return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 81, in dump_stream for batch in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 248, in init_stream_yield_batches for series in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 426, in mapper return f(keys, vals) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 170, in <lambda> return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))] File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 158, in wrapped result = f(key, pd.concat(value_series, axis=1)) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 68, in wrapper return f(args, *kwargs) File "/__w/spark/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 590, in f "{} != {}".format(expected_key[i][1], window_range) AssertionError: {'start': datetime.datetime(2018, 3, 15, 0, 0), 'end': datetime.datetime(2018, 3, 20, 0, 0)} != {'start': datetime.datetime(2018, 3, 15, 0, 0, tzinfo=<StaticTzInfo 'Etc/UTC'>), 'end': datetime.datetime(2018, 3, 20, 0, 0, tzinfo=<StaticTzInfo 'Etc/UTC'>)} ``` https://github.com/apache/spark/runs/1278917457 This PR proposes to set the upper bound of PyArrow in GitHub Actions build. This should be removed when we properly support PyArrow 2.0.0+ (SPARK-33189). ### Why are the changes needed? To make build pass. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GitHub Actions in this build will test it out. Closes #30098 from HyukjinKwon/hot-fix-test. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-20 17:35:09 +09:00
Fokko Driesprong	6ad75cda1e	[SPARK-17333][PYSPARK] Enable mypy ### What changes were proposed in this pull request? Add MyPy to the CI. Once this is installed on the CI: https://issues.apache.org/jira/browse/SPARK-32797?jql=project%20%3D%20SPARK%20AND%20text%20~%20mypy this wil automatically check the types. ### Why are the changes needed? We should check if the types are still correct on the CI. ``` MacBook-Pro-van-Fokko:spark fokkodriesprong$ ./dev/lint-python starting python compilation test... python compilation succeeded. starting pycodestyle test... pycodestyle checks passed. starting flake8 test... flake8 checks passed. starting mypy test... mypy checks passed. The sphinx-build command was not found. Skipping Sphinx build for now. all lint-python tests passed! ``` ### Does this PR introduce _any_ user-facing change? No :) ### How was this patch tested? By running `./dev/lint-python` locally. Closes #30088 from Fokko/SPARK-17333. Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-19 12:50:01 -07:00
HyukjinKwon	a7a8dae483	Revert "[SPARK-33069][INFRA] Skip test result report if no JUnit XML files are found" This reverts commit `a0aa8f33a9`.	2020-10-19 17:13:47 +09:00
Dongjoon Hyun	9f5eff0ae1	[SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs ### What changes were proposed in this pull request? This PR aims to use `pre-built image` at Github Action PySpark jobs. To isolate the changes, `pyspark` jobs are split from the main job. The docker image is built by the following. \| Item \| URL \| \| --------------- \| ------------- \| \| Dockerfile \| https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/blob/main/Dockerfile \| \| Builder \| https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/blob/main/.github/workflows/build.yml \| \| Image Location \| https://hub.docker.com/r/dongjoon/apache-spark-github-action-image \| Please note that. 1. The community still will use `build_and_test.yml` to add new features like as we did until now. The `Dockerfile` will be updated regularly. 2. When Apache Spark gets an official docker repository location, we will use it. 3. Also, it's the best if we keep this docker file and builder script at a new Apache Spark dev branch instead of outside GitHub repository. ### Why are the changes needed? Currently, two `pyspark` test jobs take over one and half hour always. In total, 3 hours 14 minutes. - https://github.com/apache/spark/runs/1240470628 (1 hour 35 mins) - https://github.com/apache/spark/runs/1240470634 (1 hour 39 mins) This PR will remove the package installation steps which takes 16 minutes and causes flakiness. Note that `Python 3.6 package installation` is not included in the pre-built image and it only takes `20s`. BEFORE ![Screen Shot 2020-10-15 at 10 32 17 AM](https://user-images.githubusercontent.com/9700541/96165634-be625080-0ed1-11eb-974b-940c112152e9.png) AFTER ![Screen Shot 2020-10-15 at 10 58 17 AM](https://user-images.githubusercontent.com/9700541/96168262-5d3c7c00-0ed5-11eb-83c5-e9dc189a156b.png) In short, `pyspark` GitHub jobs take shorter time. In total, 2 hours 23 minutes (<- 3 hours 14 minutes, previously). - https://github.com/apache/spark/pull/30059/checks?check_run_id=1260512568 (1 hour 18 mins) - https://github.com/apache/spark/pull/30059/checks?check_run_id=1260512582 (1 hour 5 mins) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action on this PR without `package installation steps`. Closes #30059 from dongjoon-hyun/SPARK-33162. Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-10-15 17:58:58 -07:00
HyukjinKwon	b089fe5376	[SPARK-32247][INFRA] Install and test scipy with PyPy in GitHub Actions ### What changes were proposed in this pull request? This PR proposes to install `scipy` as well in PyPy. It will test several ML specific test cases in PyPy as well. For example, `31a16fbb40/python/pyspark/mllib/tests/test_linalg.py (L487)` It was not installed when GitHub Actions build was added because it failed to install for an unknown reason. Seems like it's fixed in the latest scipy. ### Why are the changes needed? To improve test coverage. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GitHub Actions build in this PR will test it out. Closes #30054 from HyukjinKwon/SPARK-32247. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-15 09:08:14 -07:00
Kousuke Saruta	513b6f5af2	[SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job ### What changes were proposed in this pull request? SPARK-32926 added a build test to GitHub Action for Scala 2.13 but it's only with Maven. As SPARK-32873 reported, some compilation error happens only with SBT so I think we need to add another build test to GitHub Action for SBT. Unfortunately, we don't have abundant resources for GitHub Actions so instead of just adding the new SBT job, let's replace the existing Maven job with the new SBT job for Scala 2.13. ### Why are the changes needed? To ensure build test passes even with SBT for Scala 2.13. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? GitHub Actions' job. Closes #29958 from sarutak/add-sbt-job-for-scala-2.13. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-15 20:51:20 +09:00
Dongjoon Hyun	e85ed8a14c	[SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 ### What changes were proposed in this pull request? This PR aims to upgrade `Github Action` runner image from `Ubuntu 18.04 (LTS)` to `Ubuntu 20.04 (LTS)`. ### Why are the changes needed? `ubuntu-latest` in `GitHub Action` is still `Ubuntu 18.04 (LTS)`. - https://github.com/actions/virtual-environments#available-environments This upgrade will help Apache Spark 3.1+ preparation for vote and release on the latest OS. This is tested here. - https://github.com/dongjoon-hyun/spark/pull/36 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the `Github Action` in this PR. Closes #30050 from dongjoon-hyun/ubuntu_20.04. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-10-15 02:24:49 -07:00
HyukjinKwon	a0aa8f33a9	[SPARK-33069][INFRA] Skip test result report if no JUnit XML files are found ### What changes were proposed in this pull request? This PR proposes to skip test reporting ("Report test results") if there are no JUnit XML files are found. Currently, we're running and skipping the tests dynamically. For example, - if there are only changes in SparkR at the underlying commit, it only runs the SparkR tests, and skip the other tests and generate JUnit XML files for SparkR test cases. - if there are only changes in `docs` at the underlying commit, the build skips all tests except linters and do not generate any JUnit XML files. When test reporting ("Report test results") job is triggered after the main build ("Build and test ") is finished, and there are no JUnit XML files found, it reports the case as a failure. See https://github.com/apache/spark/runs/1196184007 as an example. This PR works around it by simply skipping the testing report when there are no JUnit XML files are found. Please see https://github.com/apache/spark/pull/29906#issuecomment-702525542 for more details. ### Why are the changes needed? To avoid false alarm for test results. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested in my fork. Positive case: https://github.com/HyukjinKwon/spark/runs/1208624679?check_suite_focus=true https://github.com/HyukjinKwon/spark/actions/runs/288996327 Negative case: https://github.com/HyukjinKwon/spark/runs/1208229838?check_suite_focus=true https://github.com/HyukjinKwon/spark/actions/runs/289000058 Closes #29946 from HyukjinKwon/test-junit-files. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-06 09:09:58 +09:00
HyukjinKwon	b205be5ff6	[SPARK-33051][INFRA][R] Uses setup-r to install R in GitHub Actions build ### What changes were proposed in this pull request? At SPARK-32493, the R installation was switched to manual installation because setup-r was broken. This seems fixed in the upstream so we should better switch it back. ### Why are the changes needed? To avoid maintaining the installation steps by ourselve. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GitHub Actions build in this PR should test it. Closes #29931 from HyukjinKwon/recover-r-build. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-10-02 15:12:33 +09:00
Dongjoon Hyun	a8442c2826	[SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action ### What changes were proposed in this pull request? The PR aims to add Scala 2.13 build test coverage into GitHub Action for Apache Spark 3.1.0. ### Why are the changes needed? The branch is ready for Scala 2.13 and this will prevent any regression. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the GitHub Action. Closes #29793 from dongjoon-hyun/SPARK-32926. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-09-17 14:01:52 -07:00
HyukjinKwon	b07e7429a6	[SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions ### What changes were proposed in this pull request? This PR proposes to explicitly cache and hash the files/directories under 'build' for SBT and Zinc at GitHub Actions. Otherwise, it can end up with overwriting `build` directory. See also https://github.com/apache/spark/pull/29286#issuecomment-679368436 Previously, other files like `build/mvn` and `build/sbt` are also cached and overwritten. So, when you have some changes there, they are ignored. ### Why are the changes needed? To make GitHub Actions build stable. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? The builds in this PR test it out. Closes #29536 from HyukjinKwon/SPARK-32695. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-26 12:25:59 +09:00
HyukjinKwon	b54103016a	[SPARK-32204][SPARK-32182][DOCS] Add a quickstart page with Binder integration in PySpark documentation ### What changes were proposed in this pull request? This PR proposes to: - add a notebook with a Binder integration which allows users to try PySpark in a live notebook. Please [try this here](https://mybinder.org/v2/gh/HyukjinKwon/spark/SPARK-32204?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb). - reuse this notebook as a quickstart guide in PySpark documentation. Note that Binder turns a Git repo into a collection of interactive notebooks. It works based on Docker image. Once somebody builds, other people can reuse the image against a specific commit. Therefore, if we run Binder with the images based on released tags in Spark, virtually all users can instantly launch the Jupyter notebooks. <br/> I made a simple demo to make it easier to review. Please see: - [Main page](https://hyukjin-spark.readthedocs.io/en/stable/). Note that the link ("Live Notebook") in the main page wouldn't work since this PR is not merged yet. - [Quickstart page](https://hyukjin-spark.readthedocs.io/en/stable/getting_started/quickstart.html) <br/> When reviewing the notebook file itself, please give my direct feedback which I will appreciate and address. Another way might be: - open [here](https://mybinder.org/v2/gh/HyukjinKwon/spark/SPARK-32204?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb). - edit / change / update the notebook. Please feel free to change as whatever you want. I can apply as are or slightly update more when I apply to this PR. - download it as a `.ipynb` file: ![Screen Shot 2020-08-20 at 10 12 19 PM](https://user-images.githubusercontent.com/6477701/90774311-3e38c800-e332-11ea-8476-699a653984db.png) - upload the `.ipynb` file here in a GitHub comment. Then, I will push a commit with that file with crediting correctly, of course. - alternatively, push a commit into this PR right away if that's easier for you (if you're a committer). References: - https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html - https://databricks.com/jp/blog/2020/03/31/10-minutes-from-pandas-to-koalas-on-apache-spark.html - my own blog post .. :-) and https://koalas.readthedocs.io/en/latest/getting_started/10min.html ### Why are the changes needed? To improve PySpark's usability. The current quickstart for Python users are very friendly. ### Does this PR introduce _any_ user-facing change? Yes, it will add a documentation page, and expose a live notebook to PySpark users. ### How was this patch tested? Manually tested, and GitHub Actions builds will test. Closes #29491 from HyukjinKwon/SPARK-32204. Lead-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-26 12:23:24 +09:00
Takeshi Yamamuro	6dd37cbaac	[SPARK-32682][INFRA] Use workflow_dispatch to enable manual test triggers ### What changes were proposed in this pull request? This PR proposes to add a `workflow_dispatch` entry in the GitHub Action script (`build_and_test.yml`). This update can enable developers to run the Spark tests for a specific branch on their own local repository, so I think it might help to check if al the tests can pass before opening a new PR. <img width="944" alt="Screen Shot 2020-08-21 at 16 28 41" src="https://user-images.githubusercontent.com/692303/90866249-96250c80-e3ce-11ea-8496-3dd6683e92ea.png"> ### Why are the changes needed? To reduce the pressure of GitHub Actions on the Spark repository. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually checked. Closes #29504 from maropu/DispatchTest. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-08-21 21:23:41 +09:00
HyukjinKwon	bfd8c34154	[SPARK-32645][INFRA] Upload unit-tests.log as an artifact ### What changes were proposed in this pull request? This PR proposes to upload `target/unit-tests.log` into the artifact so it will be able to download here: ![Screen Shot 2020-08-18 at 2 23 18 PM](https://user-images.githubusercontent.com/6477701/90474095-789e3b80-e15f-11ea-87f8-e7da3df3c03e.png) ### Why are the changes needed? Jenkins has this feature. It should be best to have the same dev functionalities with it. Also, note that this was pointed out https://github.com/apache/spark/pull/29225#discussion_r471485011. ### Does this PR introduce _any_ user-facing change? No, dev-only ### How was this patch tested? https://github.com/apache/spark/actions/runs/213000777 should demonstrate it Closes #29454 from HyukjinKwon/SPARK-32645. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-08-19 12:28:36 +09:00
HyukjinKwon	d0dfe4986b	[MINOR][INFRA] Rename master.yml to build_and_test.yml ### What changes were proposed in this pull request? This PR renames `master.yml` to `build_and_test.yml` to indicate this is the workflow that builds and runs the tests. ### Why are the changes needed? Just for readability. `master.yml` looks like the name of the branch (to me). ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GitHub Actions build in this PR will test it out. Closes #29459 from HyukjinKwon/minor-rename. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-08-18 18:18:47 +08:00
HyukjinKwon	86852c57af	[SPARK-32606][SPARK-32605][INFRA] Remove the forks of action-surefire-report and action-download-artifact in test_report.yml ### What changes were proposed in this pull request? This PR proposes to remove the usage of my own forks and use the original plugins in GitHub Actions testing report. SPARK-32357 introduced the GitHub Actions test reporting by leveraging two plugins: - [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report) - [dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact) In order to make it working, it had to fork two repositories with custom fixes: - HyukjinKwon/action-surefire-reportc96094c - `f86c565d52` The two custom fixes are thankfully merged at https://github.com/ScaCap/action-surefire-report/pull/14 and https://github.com/dawidd6/action-download-artifact/pull/24, and they released new ones to use at [ScaCap/action-surefire-report/commits/v1](https://github.com/ScaCap/action-surefire-report/commits/v1) and [dawidd6/action-download-artifact/commits/v2](https://github.com/dawidd6/action-download-artifact/commits/v2) - thanks jmisur and dawidd6 again. ### Why are the changes needed? To avoid relying on forks and code duplications. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Logically there is no diff. I tested it at https://github.com/HyukjinKwon/spark/runs/992824229 for doubly sure. NOTE that this PR cannot be tested here within the workflow triggered by this PR without merging the changes in `test_report.yml` into the master. Closes #29449 from HyukjinKwon/SPARK-32606-SPARK-32605. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-08-17 11:17:50 -07:00
Hyukjin Kwon	5debde9401	[SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions ### What changes were proposed in this pull request? This PR proposes to report the failed and succeeded tests in GitHub Actions in order to improve the development velocity by leveraging [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report). See the example below: ![Screen Shot 2020-08-13 at 8 17 52 PM](https://user-images.githubusercontent.com/6477701/90128649-28f7f280-dda2-11ea-9211-e98e34332f6b.png) Note that we cannot just use [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report) in Apache Spark because PRs are from the forked repository, and GitHub secrets are unavailable for the security reason. This plugin and all similar plugins require to have the GitHub token that has the write access in order to post test results but it is unavailable in PRs. To work around this limitation, I took this approach: 1. In workflow A, run the tests and upload the JUnit XML test results. GitHub provides to upload and download some files. 2. GitHub introduced new event type [`workflow_run`](https://github.blog/2020-08-03-github-actions-improvements-for-fork-and-pull-request-workflows/) 10 days ago. By leveraging this, it triggers another workflow B. 3. Workflow B is in the main repo instead of fork repo, and has the write access the plugin needs. In workflow B, it downloads the artifact uploaded from workflow A (from the forked repository). 4. Workflow B generates the test reports to port from JUnit xml files. 5. Workflow B looks up the PR and posts the test reports. The `workflow_run` event is very new feature, and looks not so many GitHub Actions plugins support. In order to make this working with [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report), I had to fork two GitHub Actions plugins to use: - [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report) to have this custom fix: `c96094cc35` It added `commit` argument to specify the commit to post the test reports. With `workflow_run`, it can access, in workflow B, to the commit from workflow A. - [dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact) to have this custom fix: `750b71af35` It added the support of downloading all artifacts from workflow A, in workflow B. By default, it only supports to specify the name of artifact. Note that I was not able to use the official [actions/download-artifact](https://github.com/actions/download-artifact) because: - It does not support to download artifacts between different workflows, see also https://github.com/actions/download-artifact/issues/3. Once this issue is resolved, we can switch it back to [actions/download-artifact](https://github.com/actions/download-artifact). I plan to make a pull request for both repositories so we don't have to rely on forks. ### Why are the changes needed? Currently, it's difficult to check the failed tests. You should scroll down long logs from GitHub Actions logs. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested at: https://github.com/HyukjinKwon/spark/pull/17, https://github.com/HyukjinKwon/spark/pull/18, https://github.com/HyukjinKwon/spark/pull/19, https://github.com/HyukjinKwon/spark/pull/20, and master branch of my forked repository. Closes #29333 from HyukjinKwon/SPARK-32357-fix. Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-08-13 20:50:47 -07:00
HyukjinKwon	32f4ef005f	[SPARK-32497][INFRA] Installs qpdf package for CRAN check in GitHub Actions ### What changes were proposed in this pull request? CRAN check fails due to the size of the generated PDF docs as below: ``` ... WARNING ‘qpdf’ is needed for checks on size reduction of PDFs ... Status: 1 WARNING, 1 NOTE See ‘/home/runner/work/spark/spark/R/SparkR.Rcheck/00check.log’ for details. ``` This PR proposes to install `qpdf` in GitHub Actions. Note that I cannot reproduce in my local with the same R version so I am not documenting it for now. Also, while I am here, I piggyback to install SparkR when the module includes `sparkr`. it is rather a followup of SPARK-32491. ### Why are the changes needed? To fix SparkR CRAN check failure. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GitHub Actions will test it out. Closes #29306 from HyukjinKwon/SPARK-32497. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-31 00:57:24 +09:00
HyukjinKwon	e0c8bd07af	[SPARK-32493][INFRA] Manually install R instead of using setup-r in GitHub Actions ### What changes were proposed in this pull request? This PR proposes to manually install R instead of using `setup-r` which seems broken. Currently, GitHub Actions uses its default R 3.4.4 installed, which we dropped as of SPARK-32073. While I am here, I am also upgrading R version to 4.0. Jenkins will test the old version and GitHub Actions tests the new version. AppVeyor uses R 4.0 but it does not check CRAN which is important when we make a release. ### Why are the changes needed? To recover GitHub Actions build. ### Does this PR introduce _any_ user-facing change? No, dev-only ### How was this patch tested? Manually tested at https://github.com/HyukjinKwon/spark/pull/15 Closes #29302 from HyukjinKwon/SPARK-32493. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-30 20:06:35 +09:00
Dongjoon Hyun	08a66f8fd0	[SPARK-32248][BUILD] Recover Java 11 build in Github Actions ### What changes were proposed in this pull request? This PR aims to recover Java 11 build in `GitHub Action`. ### Why are the changes needed? This test coverage is removed before. Now, it's time to recover it. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action. Closes #29295 from dongjoon-hyun/SPARK-32248. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-29 18:05:53 -07:00
HyukjinKwon	6ab29b37cf	[SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base ### What changes were proposed in this pull request? This PR proposes to redesign the PySpark documentation. I made a demo site to make it easier to review: https://hyukjin-spark.readthedocs.io/en/stable/reference/index.html. Here is the initial draft for the final PySpark docs shape: https://hyukjin-spark.readthedocs.io/en/latest/index.html. In more details, this PR proposes: 1. Use [pydata_sphinx_theme](https://github.com/pandas-dev/pydata-sphinx-theme) theme - [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/) use this theme. The CSS overwrite is ported from Koalas. The colours in the CSS were actually chosen by designers to use in Spark. 2. Use the Sphinx option to separate `source` and `build` directories as the documentation pages will likely grow. 3. Port current API documentation into the new style. It mimics Koalas and pandas to use the theme most effectively. One disadvantage of this approach is that you should list up APIs or classes; however, I think this isn't a big issue in PySpark since we're being conservative on adding APIs. I also intentionally listed classes only instead of functions in ML and MLlib to make it relatively easier to manage. ### Why are the changes needed? Often I hear the complaints, from the users, that current PySpark documentation is pretty messy to read - https://spark.apache.org/docs/latest/api/python/index.html compared other projects such as [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/). It would be nicer if we can make it more organised instead of just listing all classes, methods and attributes to make it easier to navigate. Also, the documentation has been there from almost the very first version of PySpark. Maybe it's time to update it. ### Does this PR introduce _any_ user-facing change? Yes, PySpark API documentation will be redesigned. ### How was this patch tested? Manually tested, and the demo site was made to show. Closes #29188 from HyukjinKwon/SPARK-32179. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-27 17:49:21 +09:00
HyukjinKwon	6bdd710c4d	[SPARK-32316][TESTS][INFRA] Test PySpark with Python 3.8 in Github Actions ### What changes were proposed in this pull request? This PR aims to test PySpark with Python 3.8 in Github Actions. In the script side, it is already ready: `4ad9bfd53b/python/run-tests.py (L161)` This PR includes small related fixes together: 1. Install Python 3.8 2. Only install one Python implementation instead of installing many for SQL and Yarn test cases because they need one Python executable in their test cases that is higher than Python 2. 3. Do not install Python 2 which is not needed anymore after we dropped Python 2 at SPARK-32138 4. Remove a comment about installing PyPy3 on Jenkins - SPARK-32278. It is already installed. ### Why are the changes needed? Currently, only PyPy3 and Python 3.6 are being tested with PySpark in Github Actions. We should test the latest version of Python as well because some optimizations can be only enabled with Python 3.8+. See also https://github.com/apache/spark/pull/29114 ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Was not tested. Github Actions build in this PR will test it out. Closes #29116 from HyukjinKwon/test-python3.8-togehter. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-14 20:44:09 -07:00
HyukjinKwon	4ad9bfd53b	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5 ### What changes were proposed in this pull request? This PR aims to drop Python 2.7, 3.4 and 3.5. Roughly speaking, it removes all the widely known Python 2 compatibility workarounds such as `sys.version` comparison, `__future__`. Also, it removes the Python 2 dedicated codes such as `ArrayConstructor` in Spark. ### Why are the changes needed? 1. Unsupport EOL Python versions 2. Reduce maintenance overhead and remove a bit of legacy codes and hacks for Python 2. 3. PyPy2 has a critical bug that causes a flaky test, SPARK-28358 given my testing and investigation. 4. Users can use Python type hints with Pandas UDFs without thinking about Python version 5. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle. ### Does this PR introduce _any_ user-facing change? Yes, users cannot use Python 2.7, 3.4 and 3.5 in the upcoming Spark version. ### How was this patch tested? Manually tested and also tested in Jenkins. Closes #28957 from HyukjinKwon/SPARK-32138. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-14 11:22:44 +09:00
Hyukjin Kwon	27ef3629dd	[SPARK-32292][SPARK-32252][INFRA] Run the relevant tests only in GitHub Actions ### What changes were proposed in this pull request? This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources. In addition, this PR also fixes 3 more issues very closely related together while I am here. 1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases. 2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available. 3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example, ```bash dev/run-tests.py --modules sql,core ``` which is consistent with `python/run-tests.py`, for example, ```bash python/run-tests.py --modules pyspark-core,pyspark-ml ``` 4. Lastly, also fixed the formatting issue in module specification in the matrix: ```diff - network_common, network_shuffle, repl, launcher + network-common, network-shuffle, repl, launcher, ``` which incorrectly runs build/test the modules. ### Why are the changes needed? By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc. Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested in my own forked Spark: https://github.com/HyukjinKwon/spark/pull/7 https://github.com/HyukjinKwon/spark/pull/8 https://github.com/HyukjinKwon/spark/pull/9 https://github.com/HyukjinKwon/spark/pull/10 https://github.com/HyukjinKwon/spark/pull/11 https://github.com/HyukjinKwon/spark/pull/12 Closes #29086 from HyukjinKwon/SPARK-32292. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-13 08:31:39 -07:00
Dongjoon Hyun	bc3d4bacb5	[SPARK-32245][INFRA][FOLLOWUP] Reenable Github Actions on commit ### What changes were proposed in this pull request? This PR reenables GitHub Action on every commit as a next step. ### Why are the changes needed? We carefully enabled GitHub Action on every PRs, and it looks good so far. As we saw at https://github.com/apache/spark/pull/29072, GitHub Action is already triggered at every commits on every PRs. Enabling GitHub Action on `master` branch commit doesn't make a big difference. And, we need to start to test at every commit as a next step. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual. Closes #29076 from dongjoon-hyun/reenable_gha_commit. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-12 14:50:47 -07:00
HyukjinKwon	b84ed4146d	[SPARK-32245][INFRA] Run Spark tests in Github Actions ### What changes were proposed in this pull request? This PR aims to run the Spark tests in Github Actions. To briefly explain the main idea: - Reuse `dev/run-tests.py` with SBT build - Reuse the modules in `dev/sparktestsupport/modules.py` to test each module - Pass the modules to test into `dev/run-tests.py` directly via `TEST_ONLY_MODULES` environment variable. For example, `pyspark-sql,core,sql,hive`. - `dev/run-tests.py` _does not_ take the dependent modules into account but solely the specified modules to test. Another thing to note might be `SlowHiveTest` annotation. Running the tests in Hive modules takes too much so the slow tests are extracted and it runs as a separate job. It was extracted from the actual elapsed time in Jenkins: ![Screen Shot 2020-07-09 at 7 48 13 PM](https://user-images.githubusercontent.com/6477701/87050238-f6098e80-c238-11ea-9c4a-ab505af61381.png) So, Hive tests are separated into to jobs. One is slow test cases, and the other one is the other test cases. _Note that_ the current GitHub Actions build virtually copies what the default PR builder on Jenkins does (without other profiles such as JDK 11, Hadoop 2, etc.). The only exception is Kinesis https://github.com/apache/spark/pull/29057/files#diff-04eb107ee163a50b61281ca08f4e4c7bR23 ### Why are the changes needed? Last week and onwards, the Jenkins machines became very unstable for many reasons: - Apparently, the machines became extremely slow. Almost all tests can't pass. - One machine (worker 4) started to have the corrupt `.m2` which fails the build. - Documentation build fails time to time for an unknown reason in Jenkins machine specifically. This is disabled for now at https://github.com/apache/spark/pull/29017. - Almost all PRs are basically blocked by this instability currently. The advantages of using Github Actions: - To avoid depending on few persons who can access to the cluster. - To reduce the elapsed time in the build - we could split the tests (e.g., SQL, ML, CORE), and run them in parallel so the total build time will significantly reduce. - To control the environment more flexibly. - Other contributors can test and propose to fix Github Actions configurations so we can distribute this build management cost. Note that: - The current build in Jenkins takes _more than 7 hours_. With Github actions it takes _less than 2 hours_ - We can now control the environments especially for Python easily. - The test and build look more stable than the Jenkins'. ### Does this PR introduce _any_ user-facing change? No, dev-only change. ### How was this patch tested? Tested at https://github.com/HyukjinKwon/spark/pull/4 Closes #29057 from HyukjinKwon/migrate-to-github-actions. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-11 13:09:06 -07:00
HyukjinKwon	f0c79ad88a	[MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template ### What changes were proposed in this pull request? This PR proposes to add a guide to clarify the Spark version when describing "Does this PR introduce any user-facing change?". ### Why are the changes needed? It seems confusing to write when the user facing changes happen within unreleased branches. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested in Github and it renders find as intended. Closes #28403 from HyukjinKwon/minor-more-guide. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-30 09:22:07 +09:00
Dongjoon Hyun	2d3e9601b5	[SPARK-31589][INFRA] Use `r-lib/actions/setup-r` in GitHub Action ### What changes were proposed in this pull request? This PR aims to use `r-lib/actions/setup-r` because it's more stable and maintained by 3rd party. ### Why are the changes needed? This will recover the current outage. In addition, this will be more robust in the future. As of now, this is tested via https://github.com/dongjoon-hyun/spark/pull/17 . ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the GitHub Actions, especially `Linter R` and `Generate Documents`. Closes #28382 from dongjoon-hyun/SPARK-31589. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-28 13:22:43 +09:00
HyukjinKwon	98ec4a8ced	[SPARK-31330][INFRA][FOLLOW-UP] Exclude 'ui' and 'UI.scala' in CORE and 'dev/.rat-excludes' in BUILD autolabeller ### What changes were proposed in this pull request? This PR excludes `ui` directly and `UI.scala` configuration file in `CORE` label, and exclude `dev/.rat-excludes` in `BUILD` label in autolabeller. See https://github.com/apache/spark/pull/28218, https://github.com/apache/spark/pull/28217, https://github.com/apache/spark/pull/28214 and https://github.com/apache/spark/pull/28213 There are some contexts about this https://github.com/apache/spark/pull/28114. The syntax is from https://git-scm.com/docs/gitignore#_pattern_format (see also https://github.com/kaelzhang/node-ignore) ### Why are the changes needed? To label UI component properly. ### Does this PR introduce any user-facing change? No, dev-only. ### How was this patch tested? It uses the same syntax used for other places. I expect to see the actual results after it gets merged as it's difficult to test it out. Closes #28228 from HyukjinKwon/SPARK-31330-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-16 10:16:58 +09:00
HyukjinKwon	c519fe1358	[SPARK-31330][INFRA][FOLLOW-UP] Move sbin and some files into appropriate categories in autolabeller ### What changes were proposed in this pull request? This PR is a followup of `1b87015044`. Now, we automatically label PRs, and seems working fine. This PR proposes to correct some minor list and categories. 1. Move `sbin` from `CORE` into `DEPLOY` components. ``` $ ls sbin decommission-slave.sh start-all.sh start-slave.sh stop-master.sh stop-thriftserver.sh slaves.sh start-history-server.sh start-slaves.sh stop-mesos-dispatcher.sh spark-config.sh start-master.sh start-thriftserver.sh stop-mesos-shuffle-service.sh spark-daemon.sh start-mesos-dispatcher.sh stop-all.sh stop-slave.sh spark-daemons.sh start-mesos-shuffle-service.sh stop-history-server.sh stop-slaves.sh ``` 2. `/sbin/mesos.sh` -> `MESOS` `/bin/spark-shell*` -> `SPARK SHELL`. ### Why are the changes needed? To label correctly and dev can take an advantage of it such as checking the PRs of a specific component. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? It was not tested yet. It can be tested after it was merged. Closes #28201 from HyukjinKwon/SPARK-31330. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-13 18:48:41 +09:00
Nicholas Chammas	1b87015044	[SPARK-31330] Automatically label PRs based on the paths they touch ### What changes were proposed in this pull request? This PR adds some rules that will be used by Probot Auto Labeler to label PRs based on what paths they modify. ### Why are the changes needed? This should make it easier for committers to organize PRs, and it could also help drive downstream tooling like the PR dashboard. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? We'll only be able to test it, I believe, after merging it in. Given that [the Avro project is using this same bot already](https://github.com/apache/avro/blob/master/.github/autolabeler.yml), I expect it will be straightforward to get this working. Closes #28114 from nchammas/SPARK-31330-auto-label-prs. Lead-authored-by: Nicholas Chammas <nicholas.chammas@gmail.com> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-13 10:01:31 +09:00
Dongjoon Hyun	2b744fe885	[SPARK-30963][INFRA] Add GitHub Action job for document generation ### What changes were proposed in this pull request? This PR aims to add a new `GitHub Action` job for document generation. ### Why are the changes needed? We had better test the document generation in PR Builder. - https://lists.apache.org/thread.html/rd06a2154e853812652b8f7fa3c003746ed531b213c531517f055e1dc%40%3Cdev.spark.apache.org%3E ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the GitHub Action in this PR. Closes #27715 from dongjoon-hyun/SPARK-30963. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-26 19:24:41 -08:00
Takeshi Yamamuro	29b3e42779	[MINOR] Update the PR template for adding a link to the configuration naming guideline ### What changes were proposed in this pull request? This is a follow-up of #27577. This pr intends to add a link to the configuration naming guideline in `.github/PULL_REQUEST_TEMPLATE`. ### Why are the changes needed? For reminding developers to follow the naming rules. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A Closes #27602 from maropu/pr27577-FOLLOWUP. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-02-17 16:05:08 +09:00
HyukjinKwon	cd9ccdc0ac	[SPARK-30601][BUILD] Add a Google Maven Central as a primary repository ### What changes were proposed in this pull request? This PR proposes to address four things. Three issues and fixes were a bit mixed so this PR sorts it out. See also http://apache-spark-developers-list.1001551.n3.nabble.com/Adding-Maven-Central-mirror-from-Google-to-the-build-td28728.html for the discussion in the mailing list. 1. Add the Google Maven Central mirror (GCS) as a primary repository. This will not only help development more stable but also in order to make Github Actions build (where it is always required to download jars) stable. In case of Jenkins PR builder, it wouldn't be affected too much as it uses the pre-downloaded jars under `.m2`. - Google Maven Central seems stable for heavy workload but not synced very quickly (e.g., new release is missing) - Maven Central (default) seems less stable but synced quickly. We already added this GCS mirror as a default additional remote repository at SPARK-29175. So I don't see an issue to add it as a repo. `abf759a91e/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (L2111-L2118)` 2. Currently, we have the hard-corded repository in [`sbt-pom-reader`](https://github.com/JoshRosen/sbt-pom-reader/blob/v1.0.0-spark/src/main/scala/com/typesafe/sbt/pom/MavenPomResolver.scala#L32) and this seems overwriting Maven's existing resolver by the same ID `central` with `http://` when initially the pom file is ported into SBT instance. This uses `http://` which latently Maven Central disallowed (see https://github.com/apache/spark/pull/27242) My speculation is that we just need to be able to load plugin and let it convert POM to SBT instance with another fallback repo. After that, it _seems_ using `central` with `https` properly. See also https://github.com/apache/spark/pull/27307#issuecomment-576720395. I double checked that we use `https` properly from the SBT build as well: ``` [debug] downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom ... [debug] public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom [debug] public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom.sha1 ``` This was fixed by adding the same repo (https://github.com/apache/spark/pull/27281), `central_without_mirror`, which is a bit awkward. Instead, this PR adds GCS as a main repo, and community Maven central as a fallback repo. So, presumably the community Maven central repo is used when the plugin is loaded as a fallback. 3. While I am here, I fix another issue. Github Action at https://github.com/apache/spark/pull/27279 is being failed. The reason seems to be scalafmt 1.0.3 is in Maven central but not in GCS. ``` org.apache.maven.plugin.PluginResolutionException: Plugin org.antipathy:mvn-scalafmt_2.12:1.0.3 or one of its dependencies could not be resolved: Could not find artifact org.antipathy:mvn-scalafmt_2.12🫙1.0.3 in google-maven-central (https://maven-central.storage-download.googleapis.com/repos/central/data/) at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve (DefaultPluginDependenciesResolver.java:131) ``` `mvn-scalafmt` exists in Maven central: ```bash $ curl https://repo.maven.apache.org/maven2/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom ``` ```xml <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> ... ``` whereas not in GCS mirror: ```bash $ curl https://maven-central.storage-download.googleapis.com/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom ``` ```xml <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: maven-central/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom</Details></Error>% ``` In this PR, simply make both repos accessible by adding to `pluginRepositories`. 4. Remove the workarounds in Github Actions to switch mirrors because now we have same repos in the same order (Google Maven Central first, and Maven Central second) ### Why are the changes needed? To make the build and Github Action more stable. ### Does this PR introduce any user-facing change? No, dev only change. ### How was this patch tested? I roughly checked local and PR against my fork (https://github.com/HyukjinKwon/spark/pull/2 and https://github.com/HyukjinKwon/spark/pull/3). Closes #27307 from HyukjinKwon/SPARK-30572. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-01-23 16:00:21 +09:00
Dongjoon Hyun	c992716a33	[SPARK-30572][BUILD] Add a fallback Maven repository ### What changes were proposed in this pull request? This PR aims to add a fallback Maven repository when a mirror to `central` fail. ### Why are the changes needed? We use `Google Maven Central` in GitHub Action as a mirror of `central`. However, `Google Maven Central` sometimes doesn't have newly published artifacts and there is no guarantee when we get the newly published artifacts. By duplicating `Maven Central` with a new ID, we can add a fallback Maven repository which is not mirrored by `Google Maven Central`. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually testing with the new `Twitter` chill artifacts by switching `chill.version` from `0.9.3` to `0.9.5`. ``` $ rm -rf ~/.m2/repository/com/twitter/chill* $ mvn compile \| grep chill Downloading from google-maven-central: https://maven-central.storage-download.googleapis.com/repos/central/data/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom Downloading from central_without_mirror: https://repo.maven.apache.org/maven2/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom Downloaded from central_without_mirror: https://repo.maven.apache.org/maven2/com/twitter/chill_2.12/0.9.5/chill_2.12-0.9.5.pom (2.8 kB at 11 kB/s) ``` Closes #27281 from dongjoon-hyun/SPARK-30572. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-19 17:42:34 -08:00
Nicholas Chammas	f399d655c4	[SPARK-30173] Tweak stale PR message Follow-on to #26877. ### What changes were proposed in this pull request? This PR tweaks the stale PR message to [clarify](https://github.com/apache/spark/pull/24457#issuecomment-571393900) the procedure for reopening a PR after it has been marked as stale. ### Why are the changes needed? This change should clarify the reopening process for contributors. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A Closes #27114 from nchammas/SPARK-30173-stale-tweaks. Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-01-07 08:34:59 -06:00
Nicholas Chammas	58b29392f8	[SPARK-30173][INFRA] Automatically close stale PRs ### What changes were proposed in this pull request? This PR adds [a GitHub workflow to automatically close stale PRs](https://github.com/marketplace/actions/close-stale-issues). ### Why are the changes needed? This will help cut down the number of open but stale PRs and keep the PR queue manageable. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I'm not sure how to test this PR without impacting real PRs on the repo. See: https://github.com/actions/stale/issues/32 Closes #26877 from nchammas/SPARK-30173-stale-prs. Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2019-12-15 08:42:16 -06:00
Dongjoon Hyun	16f1b23d75	[SPARK-30163][INFRA][FOLLOWUP] Make `.m2` directory for cold start without cache ### What changes were proposed in this pull request? This PR is a follow-up of https://github.com/apache/spark/pull/26793 and aims to initialize `~/.m2` directory. ### Why are the changes needed? In case of cache reset, `~/.m2` directory doesn't exist. It causes a failure. - `master` branch has a cache as of now. So, we missed this. - `branch-2.4` has no cache as of now, and we hit this failure. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This PR is tested against personal `branch-2.4`. - https://github.com/dongjoon-hyun/spark/pull/12 Closes #26794 from dongjoon-hyun/SPARK-30163-2. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-07 12:58:00 -08:00
Dongjoon Hyun	1068b8b249	[SPARK-30163][INFRA] Use Google Maven mirror in GitHub Action ### What changes were proposed in this pull request? This PR aims to use [Google Maven mirror](https://cloudplatform.googleblog.com/2015/11/faster-builds-for-Java-developers-with-Maven-Central-mirror.html) in `GitHub Action` jobs to improve the stability. ```xml <settings> <mirrors> <mirror> <id>google-maven-central</id> <name>GCS Maven Central mirror</name> <url>https://maven-central.storage-download.googleapis.com/repos/central/data/</url> <mirrorOf>central</mirrorOf> </mirror> </mirrors> </settings> ``` ### Why are the changes needed? Although we added Maven cache inside `GitHub Action`, the timeouts happen too frequently during access `artifact descriptor`. ``` [ERROR] Failed to execute goal on project spark-mllib_2.12: ... Failed to read artifact descriptor for ... ... Connection timed out (Read failed) -> [Help 1] ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This PR is irrelevant to Jenkins. This is tested on the personal repository first. `GitHub Action` of this PR should pass. - https://github.com/dongjoon-hyun/spark/pull/11 Closes #26793 from dongjoon-hyun/SPARK-30163. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-07 12:04:10 -08:00
Dongjoon Hyun	81996f9e4d	[SPARK-30152][INFRA] Enable Hadoop-2.7/JDK11 build at GitHub Action ### What changes were proposed in this pull request? This PR enables JDK11 build with `hadoop-2.7` profile at `GitHub Action`. BEFORE (6 jobs including one JDK11 job) ![before](https://user-images.githubusercontent.com/9700541/70342731-7763f300-180a-11ea-859f-69038b88451f.png) AFTER (7 jobs including two JDK11 jobs) ![after](https://user-images.githubusercontent.com/9700541/70342658-54d1da00-180a-11ea-9fba-507fc087dc62.png) ### Why are the changes needed? SPARK-29957 makes JDK11 test work with `hadoop-2.7` profile. We need to protect it. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is `GitHub Action` only PR. See the result of `GitHub Action` on this PR. Closes #26782 from dongjoon-hyun/SPARK-GHA-HADOOP-2.7. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-06 12:01:36 -08:00
Dongjoon Hyun	cb68e58f88	[MINOR][INFRA] Use GitHub Action Cache for `build` ### What changes were proposed in this pull request? This PR adds `GitHub Action Cache` task on `build` directory. ### Why are the changes needed? This will replace the Maven downloading with the cache. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually check the GitHub Action log of this PR. Closes #26652 from dongjoon-hyun/SPARK-MAVEN-CACHE. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-24 12:35:57 -08:00
Dongjoon Hyun	c98e5eb339	[SPARK-29981][BUILD] Add hive-1.2/2.3 profiles ### What changes were proposed in this pull request? This PR aims the followings. - Add two profiles, `hive-1.2` and `hive-2.3` (default) - Validate if we keep the existing combination at least. (Hadoop-2.7 + Hive 1.2 / Hadoop-3.2 + Hive 2.3). For now, we assumes that `hive-1.2` is explicitly used with `hadoop-2.7` and `hive-2.3` with `hadoop-3.2`. The followings are beyond the scope of this PR. - SPARK-29988 Adjust Jenkins jobs for `hive-1.2/2.3` combination - SPARK-29989 Update release-script for `hive-1.2/2.3` combination - SPARK-29991 Support `hive-1.2/2.3` in PR Builder ### Why are the changes needed? This will help to switch our dependencies to update the exposed dependencies. ### Does this PR introduce any user-facing change? This is a dev-only change that the build profile combinations are changed. - `-Phadoop-2.7` => `-Phadoop-2.7 -Phive-1.2` - `-Phadoop-3.2` => `-Phadoop-3.2 -Phive-2.3` ### How was this patch tested? Pass the Jenkins with the dependency check and tests to make it sure we don't change anything for now. - [Jenkins (-Phadoop-2.7 -Phive-1.2)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull) - [Jenkins (-Phadoop-3.2 -Phive-2.3)](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114192/consoleFull) Also, from now, GitHub Action validates the following combinations. ![gha](https://user-images.githubusercontent.com/9700541/69355365-822d5e00-0c36-11ea-93f7-e00e5459e1d0.png) Closes #26619 from dongjoon-hyun/SPARK-29981. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-23 10:02:22 -08:00
Dongjoon Hyun	affaefe1f3	[MINOR][INFRA] Add `io` and `net` to GitHub Action Cache ### What changes were proposed in this pull request? This PR aims to cache `~/.m2/repository/net` and `~/.m2/repository/io` to reduce the flakiness. ### Why are the changes needed? This will stabilize GitHub Action more before adding `hive-1.2` and `hive-2.3` combination. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? After the GitHub Action on this PR passes, check the log. Closes #26621 from dongjoon-hyun/SPARK-GHA-CACHE. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-11-21 15:43:57 +09:00
Liang-Chi Hsieh	e753aa30e6	[SPARK-29964][BUILD] lintr github workflows failed due to buggy GnuPG ### What changes were proposed in this pull request? Linter (R) github workflows failed sometimes like: https://github.com/apache/spark/pull/26509/checks?check_run_id=310718016 Failed message: ``` Executing: /tmp/apt-key-gpghome.8r74rQNEjj/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 gpg: connecting dirmngr at '/tmp/apt-key-gpghome.8r74rQNEjj/S.dirmngr' failed: IPC connect call failed gpg: keyserver receive failed: No dirmngr ##[error]Process completed with exit code 2. ``` It is due to a buggy GnuPG. Context: https://github.com/sbt/website/pull/825 https://github.com/sbt/sbt/issues/4261 https://github.com/microsoft/WSL/issues/3286 ### Why are the changes needed? Make lint-r github workflows work. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass github workflows. Closes #26602 from viirya/SPARK-29964. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-19 15:56:50 -08:00
Dongjoon Hyun	42f8f79ff0	[SPARK-29936][R] Fix SparkR lint errors and add lint-r GitHub Action ### What changes were proposed in this pull request? This PR fixes SparkR lint errors and adds `lint-r` GitHub Action to protect the branch. ### Why are the changes needed? It turns out that we currently don't run it. It's recovered yesterday. However, after that, our Jenkins linter jobs (`master`/`branch-2.4`) has been broken on `lint-r` tasks. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the GitHub Action on this PR in addition to Jenkins R and AppVeyor R. Closes #26564 from dongjoon-hyun/SPARK-29936. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-17 21:01:01 -08:00
Dongjoon Hyun	4b71ad6ffb	[SPARK-29820][INFRA] Use GitHub Action Cache for `./.m2/repository/[com\|org]` ### What changes were proposed in this pull request? This PR aims to enable [GitHub Action Cache on Maven local repository](https://github.com/actions/cache/blob/master/examples.md#java---maven) for the following goals. 1. To reduce the chance of failure due to the Maven download flakiness. 2. To speed up the build a little bit. Unfortunately, due to the GitHub Action Cache limitation, it seems that we cannot put all into a single cache. It's ignored like the following. - .m2/repository is 680777194 bytes ``` /bin/tar -cz -f /home/runner/work/_temp/01f162c3-0c78-4772-b3de-b619bb5d7721/cache.tgz -C /home/runner/.m2/repository . 3 ##[warning]Cache size of 680777194 bytes is over the 400MB limit, not saving cache. ``` ### Why are the changes needed? Not only for the speed up, but also for reducing the Maven download flakiness, we had better enable caching on local maven repository. The followings are the failure examples in these days. - https://github.com/apache/spark/runs/295869450 ``` [ERROR] Failed to execute goal on project spark-streaming-kafka-0-10_2.12: Could not resolve dependencies for project org.apache.spark:spark-streaming-kafka-0-10_2.12🫙spark-367716: Could not transfer artifact com.fasterxml.jackson.datatype:jackson-datatype-jdk8:jar:2.10.0 from/to central (https://repo.maven.apache.org/maven2): Failed to transfer file https://repo.maven.apache.org/maven2/com/fasterxml/jackson/datatype/ jackson-datatype-jdk8/2.10.0/jackson-datatype-jdk8-2.10.0.jar with status code 503 -> [Help 1] ... [ERROR] mvn <args> -rf :spark-streaming-kafka-0-10_2.12 ``` ``` [ERROR] Failed to execute goal on project spark-tools_2.12: Could not resolve dependencies for project org.apache.spark:spark-tools_2.12🫙3.0.0-SNAPSHOT: Failed to collect dependencies at org.clapper:classutil_2.12🫙1.5.1: Failed to read artifact descriptor for org.clapper:classutil_2.12🫙1.5.1: Could not transfer artifact org.clapper:classutil_2.12:pom:1.5.1 from/to central (https://repo.maven.apache.org/maven2): Connection timed out (Read failed) -> [Help 1] ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually check the GitHub Action log of this PR. ``` Cache restored from key: 1.8-hadoop-2.7-maven-com-5b4a9fb13c5f5ff78e65a20003a3810796e4d1fde5f24d397dfe6e5153960ce4 Cache restored from key: 1.8-hadoop-2.7-maven-org-5b4a9fb13c5f5ff78e65a20003a3810796e4d1fde5f24d397dfe6e5153960ce4 ``` Closes #26456 from dongjoon-hyun/SPARK-29820. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-10 11:02:54 -08:00
Yuming Wang	e5c176a243	[MINOR][INFRA] Change the Github Actions build command to `mvn install` ### What changes were proposed in this pull request? This PR change the Github Actions build command from `mvn package` to `mvn install` to build Scaladoc jars. ### Why are the changes needed? Sometimes `mvn install` build failure with error: `not found: type ClassName...`. More details: https://github.com/apache/spark/pull/24628#issuecomment-495655747 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A Closes #26414 from wangyum/github-action-install. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-06 09:16:50 -08:00
Dongjoon Hyun	3e2649287d	[SPARK-29199][INFRA] Add linters and license/dependency checkers to GitHub Action ### What changes were proposed in this pull request? This PR aims to add linters and license/dependency checkers to GitHub Action. This excludes `lint-r` intentionally because https://github.com/actions/setup-r is not ready. We can add that later when it becomes available. ### Why are the changes needed? This will help the PR reviews. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? See the GitHub Action result on this PR. Closes #25879 from dongjoon-hyun/SPARK-29199. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-21 08:13:00 -07:00
Yuming Wang	9e234a5434	[MINOR][INFRA] Use java-version instead of version for GitHub Action ### What changes were proposed in this pull request? This PR use `java-version` instead of `version` for GitHub Action. More details: `204b974cf4` `ac25aeee3a` ### Why are the changes needed? The `version` property will not be supported after October 1, 2019. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #25866 from wangyum/java-version. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-20 08:54:34 -07:00
Dongjoon Hyun	3bf43fb60d	[SPARK-29159][BUILD] Increase ReservedCodeCacheSize to 1G ### What changes were proposed in this pull request? This PR aims to increase the JVM CodeCacheSize from 0.5G to 1G. ### Why are the changes needed? After upgrading to `Scala 2.12.10`, the following is observed during building. ``` 2019-09-18T20:49:23.5030586Z OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. 2019-09-18T20:49:23.5032920Z OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize= 2019-09-18T20:49:23.5034959Z CodeCache: size=524288Kb used=521399Kb max_used=521423Kb free=2888Kb 2019-09-18T20:49:23.5035472Z bounds [0x00007fa62c000000, 0x00007fa64c000000, 0x00007fa64c000000] 2019-09-18T20:49:23.5035781Z total_blobs=156549 nmethods=155863 adapters=592 2019-09-18T20:49:23.5036090Z compilation: disabled (not enough contiguous free space left) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually check the Jenkins or GitHub Action build log (which should not have the above). Closes #25836 from dongjoon-hyun/SPARK-CODE-CACHE-1G. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-19 00:24:15 -07:00
Dongjoon Hyun	197732e1f4	[SPARK-29125][INFRA] Add Hadoop 2.7 combination to GitHub Action ### What changes were proposed in this pull request? Until now, we are testing JDK8/11 with Hadoop-3.2. This PR aims to extend the test coverage for JDK8/Hadoop-2.7. ### Why are the changes needed? This will prevent Hadoop 2.7 compile/package issues at PR stage. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? GitHub Action on this PR shows all three combinations now. And, this is irrelevant to Jenkins test. Closes #25824 from dongjoon-hyun/SPARK-29125. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-09-17 16:53:21 -07:00
Dongjoon Hyun	703fb2b054	[SPARK-29079][INFRA] Enable GitHub Action on PR ### What changes were proposed in this pull request? This PR enables GitHub Action on PRs. ### Why are the changes needed? So far, we detect JDK11 compilation error after merging. This PR aims to prevent JDK11 compilation error at PR stage. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manual. See the GitHub Action on this PR. Closes #25786 from dongjoon-hyun/SPARK-29079. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2019-09-13 21:50:06 +00:00
Liang-Chi Hsieh	2db45cbd5a	[SPARK-28920][INFRA] Set up java version for github workflow This patch adds java version parameter to GitHub workflow conf for JDK8/11. As we want to build JDK8/11 on GitHub workflow, we might need to add java version according current matrix. No See the GitHub workflow run result. Closes #25625 from viirya/github-workflow-java. Authored-by: Liang-Chi Hsieh <liangchi@uber.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-29 20:55:14 -07:00
Dongjoon Hyun	780aa71749	[SPARK-28919][INFRA] Add more profiles for JDK8/11 build test for Github workflow ### What changes were proposed in this pull request? This PR aims to add `-Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-3.2 -Phadoop-cloud` profiles to GitHub workflow conf. ### Why are the changes needed? Currently, we build with JDK8 and test with JDK8/11 in Jenkins. And, we use GitHub Workflow for JDK8/JDK11 building test. To test JDK11 fully, we need to enable `hive` and `hadoop-3.2` profiles for `Hive 2.3.6` and `Hadoop 3.2`. Also, this PR adds all resource manager modules, too. ### Does this PR introduce any user-facing change? No. In addition, Jenkins workload will be the same because this is specific to GitHub workflow. ### How was this patch tested? See the GitHub workflow run result. Closes #25624 from dongjoon-hyun/SPARK-JDK11-HIVE. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-08-29 19:46:21 -07:00
HyukjinKwon	0ea8db9fd3	[SPARK-28578][INFRA] Improve Github pull request template <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html 2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'. 4. Be sure to keep the PR description updated to reflect all changes. 5. Please write your PR title to summarize what this PR proposes. 6. If possible, provide a concise example to reproduce the issue for a faster review. --> ### What changes were proposed in this pull request? <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below. 1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers. 2. If you fix some SQL features, you can provide some references of other DBMSes. 3. If there is design documentation, please add the link. 4. If there is a discussion in the mailing list, please add the link. --> This PR proposes to improve the Github template for better and faster review iterations and better interactions between PR authors and reviewers. As suggested in the the [dev mailing list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-New-sections-in-Github-Pull-Request-description-template-td27527.html), this PR referred [Kubernates' PR template](https://raw.githubusercontent.com/kubernetes/kubernetes/master/.github/PULL_REQUEST_TEMPLATE.md). Therefore, those fields are newly added: ``` ### Why are the changes needed? ### Does this PR introduce any user-facing change? ``` and some comments were added. ### Why are the changes needed? <!-- Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. --> Currently, many PR descriptions are poorly formatted, which causes some overheads between PR authors and reviewers. There are multiple problems by those poorly formatted PR descriptions: - Some PRs still write single line in PR description with 500+- code changes in a critical path. - Some PRs do not describe behaviour changes and reviewers need to find and document. - Some PRs are hard to review without outlines but they are not mentioned sometimes. - Spark is being old and sometimes we need to track the history deep. Due to poorly formatted PR description, sometimes it requires to read whole codes of whole commit histories to find the root cause of a bug. - Reviews take a while but the number of PR still grows. This PR targets to alleviate the problems and situation. ### Does this PR introduce any user-facing change? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If no, write 'No'. --> Yes, it changes the PR templates when PRs are open. This PR uses the template this PR proposes. ### How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Manually tested via Github preview feature. Closes #25310 from HyukjinKwon/SPARK-28578. Lead-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-08-16 09:45:15 +09:00
DB Tsai	c2b40a76bb	[SPARK-28719][BUILD][FOLLOWUP] Make Github Actions log quieter ## What changes were proposed in this pull request? Make Github Actions log quieter Closes #25468 from dbtsai/actions2. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2019-08-15 22:22:44 +00:00
DB Tsai	3302042ec4	[SPARK-28719][BUILD] [FOLLOWUP] Add JDK11 for Github Actions ## What changes were proposed in this pull request? Add JDK11 for Github Actions Closes #25444 from dbtsai/jdk11. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2019-08-14 03:14:07 +00:00
DB Tsai	601fd45814	[SPARK-28719][BUILD] Enable Github Actions for master ## What changes were proposed in this pull request? Github now provides free CI/CD for build, test, and deploy. This PR enables a simple Github Actions to build master with JDK8 with latest Ubuntu. We can extend it with different versions of JDK, and even build Spark with docker images in the future. Closes #25440 from dbtsai/actions. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2019-08-13 22:55:02 +00:00
Sean Owen	eed6de1a65	[MINOR][DOCS] Tighten up some key links to the project and download pages to use HTTPS ## What changes were proposed in this pull request? Tighten up some key links to the project and download pages to use HTTPS ## How was this patch tested? N/A Closes #24665 from srowen/HTTPSURLs. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-05-21 10:56:42 -07:00
Sean Owen	7e0cd1d9b1	[SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web site ## What changes were proposed in this pull request? Updates links to the wiki to links to the new location of content on spark.apache.org. ## How was this patch tested? Doc builds Author: Sean Owen <sowen@cloudera.com> Closes #15967 from srowen/SPARK-18073.1.	2016-11-23 11:25:47 +00:00
Sean Owen	f8062b63fc	[SPARK-17840][DOCS] Add some pointers for wiki/CONTRIBUTING.md in README.md and some warnings in PULL_REQUEST_TEMPLATE ## What changes were proposed in this pull request? Link to contributing wiki in PR template, README.md ## How was this patch tested? Doc-only change, tested by Jekyll Author: Sean Owen <sowen@cloudera.com> Closes #15429 from srowen/SPARK-17840.	2016-10-12 11:14:03 -07:00
Rahul Tanwani	831429170f	[MINOR][MAINTENANCE] Fix typo for the pull request template. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was the this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Rahul Tanwani <tanwanirahul@gmail.com> Closes #11343 from tanwanirahul/pull_request_template.	2016-02-24 00:45:31 -08:00
Reynold Xin	892b2dd6dd	Add github pull request template	2016-02-17 22:14:45 -05:00

1 2 3

134 commits