### What changes were proposed in this pull request?
This patch upgrades pycodestyle from v2.4.0 to v2.6.0. The changes at each release:
2.5.0: https://pycodestyle.pycqa.org/en/latest/developer.html#id3
2.6.0a1: https://pycodestyle.pycqa.org/en/latest/developer.html#a1-2020-04-23
2.6.0: https://pycodestyle.pycqa.org/en/latest/developer.html#id2
Changes: Dropped Python 2.6 and 3.3 support, added Python 3.7 and 3.8 support...
### Why are the changes needed?
Including bug fixes and newer Python version support.
### Does this PR introduce _any_ user-facing change?
No, dev only.
### How was this patch tested?
Ran `dev/lint-python` locally.
Closes#29249 from viirya/upgrade-pycodestyle.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade `json4s` to from 3.6.6 to 3.7.0-M5 for Scala 2.13 support at Apache Spark 3.1.0 on December. We will upgrade to the latest `json4s` around November.
### Why are the changes needed?
`json4s` starts to support Scala 2.13 since v3.7.0-M4.
- https://github.com/json4s/json4s/issues/660
- b013af8e75
Old `json4s` causes many UT failures with `NoSuchMethodException`.
```scala
Cause: java.lang.NoSuchMethodException: scala.collection.immutable.Seq$.apply(scala.collection.Seq)
at java.lang.Class.getMethod(Class.java:1786)
```
The following is one example.
```scala
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.executor.CoarseGrainedExecutorBackendSuite
...
Tests: succeeded 4, failed 9, canceled 0, ignored 0, pending 0
*** 9 TESTS FAILED ***
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
1. **Scala 2.12**: Pass the Jenkins or GitHub Action with the existing tests.
2. **Scala 2.13**: Do the following manually at least.
```scala
$ dev/change-scala-version.sh 2.13
$ build/mvn test -pl core --am -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.executor.CoarseGrainedExecutorBackendSuite
...
Tests: succeeded 13, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
Closes#29239 from dongjoon-hyun/SPARK-32441.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR proposes to avoid using subshell when it activates Conda environment. Looks like it ends up with activating the env within the subshell even if you use `conda` command.
### Why are the changes needed?
If you take a close look for GitHub Actions log:
```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9
Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB)
Using legacy setup.py install for pyspark, since package 'wheel' is not installed.
Installing collected packages: py4j, pyspark
Running setup.py install for pyspark: started
Running setup.py install for pyspark: finished with status 'done'
Successfully installed py4j-0.10.9 pyspark-3.1.0.dev0
...
Installing dist into virtual env
Obtaining file:///home/runner/work/spark/spark/python
Collecting py4j==0.10.9
Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB)
Installing collected packages: py4j, pyspark
Attempting uninstall: py4j
Found existing installation: py4j 0.10.9
Uninstalling py4j-0.10.9:
Successfully uninstalled py4j-0.10.9
Attempting uninstall: pyspark
Found existing installation: pyspark 3.1.0.dev0
Uninstalling pyspark-3.1.0.dev0:
Successfully uninstalled pyspark-3.1.0.dev0
Running setup.py develop for pyspark
Successfully installed py4j-0.10.9 pyspark
```
It looks not properly using Conda as it removes the previously installed one when it reinstalls again.
We should ideally test it with Conda environment as it's intended.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
GitHub Actions will test. I also manually tested in my local.
Closes#29212 from HyukjinKwon/SPARK-32419.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This reverts commit 026b0b926d.
### Why are the changes needed?
As HyukjinKwon pointed out in https://github.com/apache/spark/pull/29133#issuecomment-663339240, there is no JUnit test report after https://github.com/apache/spark/pull/29133. Let's revert https://github.com/apache/spark/pull/29133 for now and find a better solution to improve the log output later.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
GitHub Actions build
Closes#29219 from gengliangwang/revertErrorOnly.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
### What changes were proposed in this pull request?
Updates to scalatest 3.2.0. Though it looks large, it is 99% changes to the new location of scalatest classes.
### Why are the changes needed?
3.2.0+ has a fix that is required for Scala 2.13.3+ compatibility.
### Does this PR introduce _any_ user-facing change?
No, only affects tests.
### How was this patch tested?
Existing tests.
Closes#29196 from srowen/SPARK-32398.
Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR proposes:
- Don't use `--user` in pip packaging test
- Pull `source` out of the subshell, and place it first.
- Exclude user sitepackages in Python path during pip installation test
to address the flakiness of the pip packaging test in Jenkins.
(I think) #29116 caused this flakiness given my observation in the Jenkins log. I had to work around by specifying `--user` but it turned out that it does not properly work in old Conda on Jenkins for some reasons. Therefore, reverting this change back.
(I think) the installation at user site-packages affects other environments created by Conda in the old Conda version that Jenkins has. Seems it fails to isolate the environments for some reasons. So, it excludes user sitepackages in the Python path during the test.
In addition, #29116 also added some fallback logics of `conda (de)activate` and `source (de)activate` because Conda prefers to use `conda (de)activate` now per the official documentation and `source (de)activate` doesn't work for some reasons in certain environments (see also https://github.com/conda/conda/issues/7980). The problem was that `source` loads things to the current shell so does not affect the current shell. Therefore, this PR pulls `source` out of the subshell.
Disclaimer: I made the analysis purely based on Jenkins machine's log in this PR. It may have a different reason I missed during my observation.
### Why are the changes needed?
To make the build and tests pass in Jenkins.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Jenkins tests should test it out.
Closes#29117 from HyukjinKwon/debug-conda.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Use `/usr/bin/env python3` consistently instead of `/usr/bin/env python` in build scripts, to reliably select Python 3.
### Why are the changes needed?
Scripts no longer work with Python 2.
### Does this PR introduce _any_ user-facing change?
No, should be all build system changes.
### How was this patch tested?
Existing tests / NA
Closes#29151 from srowen/SPARK-29909.2.
Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to rename `HADOOP2_MODULE_PROFILES` to `HADOOP_MODULE_PROFILES` because Hadoop 3 is now the default.
### Why are the changes needed?
Hadoop 3 is now the default.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GitHub Action dependency test.
Closes#29128 from williamhyun/williamhyun-patch-3.
Authored-by: williamhyun <62487364+williamhyun@users.noreply.github.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR aims to upgrade PySpark's embedded cloudpickle to the latest cloudpickle v1.5.0 (See https://github.com/cloudpipe/cloudpickle/blob/v1.5.0/cloudpickle/cloudpickle.py)
### Why are the changes needed?
There are many bug fixes. For example, the bug described in the JIRA:
dill unpickling fails because they define `types.ClassType`, which is undefined in dill. This results in the following error:
```
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 279, in loads
return dill.loads(s)
File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads
return load(file, ignore)
File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load
obj = pik.load()
File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 577, in _load_type
return _reverse_typemap[name]
KeyError: 'ClassType'
```
See also https://github.com/cloudpipe/cloudpickle/issues/82. This was fixed for cloudpickle 1.3.0+ (https://github.com/cloudpipe/cloudpickle/pull/337), but PySpark's cloudpickle.py doesn't have this change yet.
More notably, now it supports C pickle implementation with Python 3.8 which hugely improve performance. This is already adopted in another project such as Ray.
### Does this PR introduce _any_ user-facing change?
Yes, as described above, the bug fixes. Internally, users also could leverage the fast cloudpickle backed by C pickle.
### How was this patch tested?
Jenkins will test it out.
Closes#29114 from HyukjinKwon/SPARK-32094.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR will remove references to these "blacklist" and "whitelist" terms besides the blacklisting feature as a whole, which can be handled in a separate JIRA/PR.
This touches quite a few files, but the changes are straightforward (variable/method/etc. name changes) and most quite self-contained.
### Why are the changes needed?
As per discussion on the Spark dev list, it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world.
### Does this PR introduce _any_ user-facing change?
In the test file `HiveQueryFileTest`, a developer has the ability to specify the system property `spark.hive.whitelist` to specify a list of Hive query files that should be tested. This system property has been renamed to `spark.hive.includelist`. The old property has been kept for compatibility, but will log a warning if used. I am open to feedback from others on whether keeping a deprecated property here is unnecessary given that this is just for developers running tests.
### How was this patch tested?
Existing tests should be suitable since no behavior changes are expected as a result of this PR.
Closes#28874 from xkrogen/xkrogen-SPARK-32036-rename-blacklists.
Authored-by: Erik Krogen <ekrogen@linkedin.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
### What changes were proposed in this pull request?
Currently the Jenkins PIP packaging test fails as below intermediately:
```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
Downloading 6a4fb90cd2/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
Found existing installation: py4j 0.10.9
Uninstalling py4j-0.10.9:
Successfully uninstalled py4j-0.10.9
Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
status = self.run(options, args)
File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
use_user_site=options.use_user_site,
File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
auto_confirm=True
File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
uninstalled_pathset = UninstallPathSet.from_dist(dist)
File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
'(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```
- https://github.com/apache/spark/pull/29099#issuecomment-658073453 (amp-jenkins-worker-04)
- https://github.com/apache/spark/pull/29090#issuecomment-657819973 (amp-jenkins-worker-03)
Seems like the previous installation of editable mode affects other PRs.
This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.
### Why are the changes needed?
To recover the Jenkins build.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Jenkins build will test it out.
Closes#29102 from HyukjinKwon/SPARK-32303.
Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to drop Python 2.7, 3.4 and 3.5.
Roughly speaking, it removes all the widely known Python 2 compatibility workarounds such as `sys.version` comparison, `__future__`. Also, it removes the Python 2 dedicated codes such as `ArrayConstructor` in Spark.
### Why are the changes needed?
1. Unsupport EOL Python versions
2. Reduce maintenance overhead and remove a bit of legacy codes and hacks for Python 2.
3. PyPy2 has a critical bug that causes a flaky test, SPARK-28358 given my testing and investigation.
4. Users can use Python type hints with Pandas UDFs without thinking about Python version
5. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
### Does this PR introduce _any_ user-facing change?
Yes, users cannot use Python 2.7, 3.4 and 3.5 in the upcoming Spark version.
### How was this patch tested?
Manually tested and also tested in Jenkins.
Closes#28957 from HyukjinKwon/SPARK-32138.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources.
In addition, this PR also fixes 3 more issues very closely related together while I am here.
1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases.
2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available.
3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example,
```bash
dev/run-tests.py --modules sql,core
```
which is consistent with `python/run-tests.py`, for example,
```bash
python/run-tests.py --modules pyspark-core,pyspark-ml
```
4. Lastly, also fixed the formatting issue in module specification in the matrix:
```diff
- network_common, network_shuffle, repl, launcher
+ network-common, network-shuffle, repl, launcher,
```
which incorrectly runs build/test the modules.
### Why are the changes needed?
By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc.
Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Manually tested in my own forked Spark:
https://github.com/HyukjinKwon/spark/pull/7https://github.com/HyukjinKwon/spark/pull/8https://github.com/HyukjinKwon/spark/pull/9https://github.com/HyukjinKwon/spark/pull/10https://github.com/HyukjinKwon/spark/pull/11https://github.com/HyukjinKwon/spark/pull/12Closes#29086 from HyukjinKwon/SPARK-32292.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to run the Spark tests in Github Actions.
To briefly explain the main idea:
- Reuse `dev/run-tests.py` with SBT build
- Reuse the modules in `dev/sparktestsupport/modules.py` to test each module
- Pass the modules to test into `dev/run-tests.py` directly via `TEST_ONLY_MODULES` environment variable. For example, `pyspark-sql,core,sql,hive`.
- `dev/run-tests.py` _does not_ take the dependent modules into account but solely the specified modules to test.
Another thing to note might be `SlowHiveTest` annotation. Running the tests in Hive modules takes too much so the slow tests are extracted and it runs as a separate job. It was extracted from the actual elapsed time in Jenkins:
![Screen Shot 2020-07-09 at 7 48 13 PM](https://user-images.githubusercontent.com/6477701/87050238-f6098e80-c238-11ea-9c4a-ab505af61381.png)
So, Hive tests are separated into to jobs. One is slow test cases, and the other one is the other test cases.
_Note that_ the current GitHub Actions build virtually copies what the default PR builder on Jenkins does (without other profiles such as JDK 11, Hadoop 2, etc.). The only exception is Kinesis https://github.com/apache/spark/pull/29057/files#diff-04eb107ee163a50b61281ca08f4e4c7bR23
### Why are the changes needed?
Last week and onwards, the Jenkins machines became very unstable for many reasons:
- Apparently, the machines became extremely slow. Almost all tests can't pass.
- One machine (worker 4) started to have the corrupt `.m2` which fails the build.
- Documentation build fails time to time for an unknown reason in Jenkins machine specifically. This is disabled for now at https://github.com/apache/spark/pull/29017.
- Almost all PRs are basically blocked by this instability currently.
The advantages of using Github Actions:
- To avoid depending on few persons who can access to the cluster.
- To reduce the elapsed time in the build - we could split the tests (e.g., SQL, ML, CORE), and run them in parallel so the total build time will significantly reduce.
- To control the environment more flexibly.
- Other contributors can test and propose to fix Github Actions configurations so we can distribute this build management cost.
Note that:
- The current build in Jenkins takes _more than 7 hours_. With Github actions it takes _less than 2 hours_
- We can now control the environments especially for Python easily.
- The test and build look more stable than the Jenkins'.
### Does this PR introduce _any_ user-facing change?
No, dev-only change.
### How was this patch tested?
Tested at https://github.com/HyukjinKwon/spark/pull/4Closes#29057 from HyukjinKwon/migrate-to-github-actions.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade to ZStd 1.4.5-4.
### Why are the changes needed?
ZStd 1.4.5-4 fixes the following.
- 3d16e51525
- 3d51bdcb82
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the Jenkins.
Closes#28969 from williamhyun/zstd2.
Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR proposes to use Hadoop 3 winutils to make AppVeyor builds pass. Currently it's being failed as below
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33976604
### Why are the changes needed?
To recover the build in AppVeyor.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
AppVeyor build will test it out.
Closes#29042 from HyukjinKwon/SPARK-32231.
Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to disable dependency tests(test-dependencies.sh) from Jenkins.
### Why are the changes needed?
- First of all, GitHub Action provides the same test capability already and stabler.
- Second, currently, `test-dependencies.sh` fails very frequently in AmpLab Jenkins environment. For example, in the following irrelevant PR, it fails 5 times during 6 hours.
- https://github.com/apache/spark/pull/29001
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the Jenkins without `test-dependencies.sh` invocation.
Closes#29004 from dongjoon-hyun/SPARK-32178.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
According to the dev mailing list discussion, this PR aims to switch the default Apache Hadoop dependency from 2.7.4 to 3.2.0 for Apache Spark 3.1.0 on December 2020.
| Item | Default Hadoop Dependency |
|------|-----------------------------|
| Apache Spark Website | 3.2.0 |
| Apache Download Site | 3.2.0 |
| Apache Snapshot | 3.2.0 |
| Maven Central | 3.2.0 |
| PyPI | 2.7.4 (We will switch later) |
| CRAN | 2.7.4 (We will switch later) |
| Homebrew | 3.2.0 (already) |
In Apache Spark 3.0.0 release, we focused on the other features. This PR targets for [Apache Spark 3.1.0 scheduled on December 2020](https://spark.apache.org/versioning-policy.html).
### Why are the changes needed?
Apache Hadoop 3.2 has many fixes and new cloud-friendly features.
**Reference**
- 2017-08-04: https://hadoop.apache.org/release/2.7.4.html
- 2019-01-16: https://hadoop.apache.org/release/3.2.0.html
### Does this PR introduce _any_ user-facing change?
Since the default Hadoop dependency changes, the users will get a better support in a cloud environment.
### How was this patch tested?
Pass the Jenkins.
Closes#28897 from dongjoon-hyun/SPARK-32058.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR proposes to upgrade R version to 4.0.2 in the release docker image. As of SPARK-31918, we should make a release with R 4.0.0+ which works with R 3.5+ too.
### Why are the changes needed?
To unblock releases on CRAN.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Manually tested via scripts under `dev/create-release`, manually attaching to the container and checking the R version.
Closes#28922 from HyukjinKwon/SPARK-32089.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
R version 4.0.2 was released, see https://cran.r-project.org/doc/manuals/r-release/NEWS.html. This PR targets to upgrade R version in AppVeyor CI environment.
### Why are the changes needed?
To test the latest R versions before the release, and see if there are any regressions.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
AppVeyor will test.
Closes#28909 from HyukjinKwon/SPARK-32074.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade to Apache Commons Lang 3.10.
### Why are the changes needed?
This will bring the latest bug fixes like [LANG-1453](https://issues.apache.org/jira/browse/LANG-1453).
https://commons.apache.org/proper/commons-lang/release-notes/RELEASE-NOTES-3.10.txt
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the Jenkins.
Closes#28889 from williamhyun/commons-lang-3.10.
Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This is auto-updated by running script `translate-contributors.py`
Closes#28861 from cloud-fan/update.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
If a Spark distribution has built-in hadoop runtime, Spark will not populate the hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath` when a job is submitted to Yarn. Users can override this behavior by setting `spark.yarn.populateHadoopClasspath` to `true`.
### Why are the changes needed?
Without this, Spark will populate the hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath` even Spark distribution has built-in hadoop. This results jar conflict and many unexpected behaviors in runtime.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manually test with two builds, with-hadoop and no-hadoop builds.
Closes#28788 from dbtsai/yarn-classpath.
Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
### What changes were proposed in this pull request?
After #28192, the job list page becomes very slow.
For example, after the following operation, the UI loading can take >40 sec.
```
(1 to 1000).foreach(_ => sc.parallelize(1 to 10).collect)
```
This is caused by a [performance issue of `vis-timeline`](https://github.com/visjs/vis-timeline/issues/379). The serious issue affects both branch-3.0 and branch-2.4
I tried a different version 4.21.0 from https://cdnjs.com/libraries/vis
The infinite drawing issue seems also fixed if the zoom is disabled as default.
### Why are the changes needed?
Fix the serious perf issue in web UI by falling back vis-timeline-graph2d to an ealier version.
### Does this PR introduce _any_ user-facing change?
Yes, fix the UI perf regression
### How was this patch tested?
Manual test
Closes#28806 from gengliangwang/downgradeVis.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
### What changes were proposed in this pull request?
When removing non-existing files in the release script, do not fail.
### Why are the changes needed?
This is to make the release script more robust, as we don't care if the files exist before we remove them.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
tested when cutting 3.0.0 RC
Closes#28815 from cloud-fan/release.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade to Zstd 1.4.5.
### Why are the changes needed?
Zstd 1.4.5 improves performance.
https://github.com/facebook/zstd/releases/tag/v1.4.5
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Passed the Jenkins.
Closes#28682 from williamhyun/zstd.
Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
### What changes were proposed in this pull request?
Only push the release tag after the build has finished.
### Why are the changes needed?
If the build fails we don't need a release tag.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Running locally with a fake user in https://github.com/apache/spark/pull/28667Closes#28700 from holdenk/SPARK-31860-build-master-only-push-tags-on-success.
Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
### What changes were proposed in this pull request?
Allow overriding the zinc options in the docker release and set a higher so the publish step can succeed consistently.
### Why are the changes needed?
The publish step experiences memory pressure.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Running test locally with fake user to see if publish step (besides svn part) succeeds
Closes#28698 from holdenk/SPARK-31889-docker-release-script-does-not-allocate-enough-memory-to-reliably-publish.
Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
### What changes were proposed in this pull request?
This PR aims to upgrade `commons-io` from 2.4 to 2.5 for Apache Spark 3.1.
### Why are the changes needed?
Since Hadoop 3.1, `commons-io` 2.5 is used.
- https://issues.apache.org/jira/browse/HADOOP-15261
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the Jenkins with Hadoop-3.2 profile.
Maven dependency is verified via `test-dependencies.sh` automatically. SBT dependency can be verified like the following manually.
```
build/sbt -Phadoop-3.2 "core/dependencyTree" | grep commons-io:commons-io | head -n1
[info] | | +-commons-io:commons-io:2.5
```
Closes#28665 from dongjoon-hyun/SPARK-31858.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR proposes to upgrade Janino to 3.1.2 which is released recently.
Major changes were done for refactoring, as well as there're lots of "no commit message". Belows are the pairs of (commit title, commit) which seem to deal with some bugs or specific improvements (not purposed to refactor) after 3.0.15.
* Issue #119: Guarantee executing popOperand() in popUninitializedVariableOperand() via moving popOperand() out of "assert"
* Issue #116: replace operand to final target type if boxing conversion and widening reference conversion happen together
* Merged pull request `#114` "Grow the code for relocatables, and do fixup, and relocate".
* 367c58e73e
* issue `#107`: Janino requires "org.codehaus.commons.compiler.io", but commons-compiler does not export this package
* f7d99596d4
* Throw an NYI CompileException when a static interface method is invoked.
* efd3884983
* Fixed the promotion of the array access index expression (see JLS7 15.13 Array Access Expressions)
* 32fdb5f5f1
* Issue `#104`: ClassLoaderIClassLoader 's ClassNotFoundException handle mechanism enhancement
* 6e8a97d609
You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html
### Why are the changes needed?
We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: https://github.com/janino-compiler/janino/issues/113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.1.1 contains the PR https://github.com/janino-compiler/janino/pull/114 which would enable Janino to succeed to compile user's query properly. I've also fixed a couple of more bugs as 3.1.1 made Spark UTs fail - hence we need to upgrade to 3.1.2.
Furthermore, from my testing, https://github.com/janino-compiler/janino/issues/90 (which Josh Rosen filed before) seems to be also resolved in 3.1.2 as well.
Looks like Janino is maintained by one person and there's no even version branches and releases/tags so we can't expect Janino maintainer to release a new bugfix version - hence have to try out new minor version.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing UTs.
Closes#27860 from HeartSaVioR/SPARK-31101.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
bump the timeout to match what's set in jenkins
### Why are the changes needed?
tests be timing out!
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
via jenkins
Closes#28666 from shaneknapp/increase-jenkins-timeout.
Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: shane knapp <incomplete@gmail.com>
### What changes were proposed in this pull request?
There is an unnecessary dependency for `mssql-jdbc`. In this PR I've removed it.
### Why are the changes needed?
Unnecessary dependency.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the Jenkins with the following configuration.
- [x] Pass the dependency test.
- [x] SBT with Hadoop-3.2 (https://github.com/apache/spark/pull/28640#issuecomment-634192512)
- [ ] Maven with Hadoop-3.2
Closes#28640 from gaborgsomogyi/SPARK-31821.
Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to use the style that is compatible with both python 2 and 3.
### Why are the changes needed?
This will help python 3 migration.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual.
Closes#28632 from williamhyun/use_python3_style.
Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade `kubernetes-client` library to bring the JDK8 related fixes. Please note that JDK11 works fine without any problem.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v4.9.2
- JDK8 always uses http/1.1 protocol (Prevent OkHttp from wrongly enabling http/2)
### Why are the changes needed?
OkHttp "wrongly" detects the Platform as Jdk9Platform on JDK 8u251.
- https://github.com/fabric8io/kubernetes-client/issues/2212
- https://stackoverflow.com/questions/61565751/why-am-i-not-able-to-run-sparkpi-example-on-a-kubernetes-k8s-cluster
Although there is a workaround `export HTTP2_DISABLE=true` and `Downgrade JDK or K8s`, we had better avoid this problematic situation.
### Does this PR introduce _any_ user-facing change?
No. This will recover the failures on JDK 8u252.
### How was this patch tested?
- [x] Pass the Jenkins UT (https://github.com/apache/spark/pull/28601#issuecomment-632474270)
- [x] Pass the Jenkins K8S IT with the K8s 1.13 (https://github.com/apache/spark/pull/28601#issuecomment-632438452)
- [x] Manual testing with K8s 1.17.3. (Below)
**v1.17.6 result (on Minikube)**
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning
Run completed in 8 minutes, 27 seconds.
Total number of tests run: 19
Suites: completed 2, aborted 0
Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
Closes#28601 from dongjoon-hyun/SPARK-K8S-CLIENT.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR mainly adds two things.
1. Real headless browser support for UI test
2. A test suite using headless Chrome as one instance of those browsers.
Also, for environment where Chrome and Chrome driver is not installed, `ChromeUITest` tag is added to filter out the test suite.
### Why are the changes needed?
In the current master, there are two problems for UI test.
1. Lots of tests especially JavaScript related ones are done manually.
Appearance is better to be confirmed by our eyes but logic should be tested by test cases ideally.
2. Compared to the real web browsers, HtmlUnit doesn't seem to support JavaScript enough.
I added a JavaScript related test before for SPARK-31534 using HtmlUnit which is simple library based headless browser for test.
The test I added works somehow but some JavaScript related error is shown in unit-tests.log.
```
======= EXCEPTION START ========
Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException]
com.gargoylesoftware.htmlunit.ScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:904)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:835)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:807)
at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:52)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:102)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:426)
at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:157)
at java.lang.Thread.run(Thread.java:748)
Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1009)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:413)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3264)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:828)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:889)
... 10 more
JavaScriptException value = Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)".
== CALLING JAVASCRIPT ==
function () {
throw e;
}
======= EXCEPTION END ========
```
I tried to upgrade HtmlUnit to 2.40.0 but what is worse, the test become not working even though it works on real browsers like Chrome, Safari and Firefox without error.
```
[info] UISeleniumSuite:
[info] - SPARK-31534: text for tooltip should be escaped *** FAILED *** (17 seconds, 745 milliseconds)
[info] The code passed to eventually never returned normally. Attempted 2 times over 12.910785232 seconds. Last failure message: com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (http://192.168.1.209:62132/static/vis-timeline-graph2d.min.js#52(Function)#1)
```
To resolve those problems, it's better to support headless browser for UI test.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I tested with following patterns. Both Chrome and Chrome driver should be installed to test.
1. sbt / with chromedriver / include tag (expect to succeed)
`build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"`
2. sbt / with chromedriver / exclude tag (expect to be ignored)
`build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite -l org.apache.spark.tags.ChromeUITest"`
3. sbt / without chromedriver / include tag (expect to be failed)
`build/sbt "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"`
4. sbt / without chromedriver / exclude tag (expect to be skipped)
`build/sbt -Dtest.exclude.tags=org.apache.spark.tags.ChromeUITest "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"`
5. Maven / wth chromedriver / include tag (expect to succeed)
`build/mvn -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test`
6. Maven / with chromedriver / exclude tag (expect to be skipped)
`build/mvn -Dtest.exclude.tags="org.apache.spark.tags.ChromeUITest" -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test`
7. Maven / without chromedriver / include tag (expect to be failed)
`build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test`
8. Maven / without chromedriver / exclude tag (expect to be skipped)
`build/mvn -Dtest.exclude.tags=org.apache.spark.tags.ChromeUITest -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test`
Closes#28578 from sarutak/real-headless-browser-support.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This adds UI updates to support stage level scheduling and ResourceProfiles. 3 main things have been added. ResourceProfile id added to the Stage page, the Executors page now has an optional selectable column to show the ResourceProfile Id of each executor, and the Environment page now has a section with the ResourceProfile ids. Along with this the rest api for environment page was updated to include the Resource profile information.
I debating on splitting the resource profile information into its own page but I wasn't sure it called for a completely separate page. Open to peoples thoughts on this.
Screen shots:
![Screen Shot 2020-04-01 at 3 07 46 PM](https://user-images.githubusercontent.com/4563792/78185169-469a7000-7430-11ea-8b0c-d9ede2d41df8.png)
![Screen Shot 2020-04-01 at 3 08 14 PM](https://user-images.githubusercontent.com/4563792/78185175-48fcca00-7430-11ea-8d1d-6b9333700f32.png)
![Screen Shot 2020-04-01 at 3 09 03 PM](https://user-images.githubusercontent.com/4563792/78185176-4a2df700-7430-11ea-92d9-73c382bb0f32.png)
![Screen Shot 2020-04-01 at 11 05 48 AM](https://user-images.githubusercontent.com/4563792/78185186-4dc17e00-7430-11ea-8962-f749dd47ea60.png)
### Why are the changes needed?
For user to be able to know what resource profile was used with which stage and executors. The resource profile information is also available so user debugging can see exactly what resources were requested with that profile.
### Does this PR introduce any user-facing change?
Yes, UI updates.
### How was this patch tested?
Unit tests and tested on yarn both active applications and with the history server.
Closes#28094 from tgravescs/SPARK-29303-pr.
Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
### What changes were proposed in this pull request?
After changes in SPARK-20628, CoarseGrainedSchedulerBackend can decommission an executor and stop assigning new tasks on it. We should also decommission the corresponding blockmanagers in the same way. i.e. Move the cached RDD blocks from those executors to other active executors.
### Why are the changes needed?
We need to gracefully decommission the block managers so that the underlying RDD cache blocks are not lost in case the executors are taken away forcefully after some timeout (because of spotloss/pre-emptible VM etc). Its good to save as much cache data as possible.
Also In future once the decommissioning signal comes from Cluster Manager (say YARN/Mesos etc), dynamic allocation + this change gives us opportunity to downscale the executors faster by making the executors free of cache data.
Note that this is a best effort approach. We try to move cache blocks from decommissioning executors to active executors. If the active executors don't have free resources available on them for caching, then the decommissioning executors will keep the cache block which it was not able to move and it will still be able to serve them.
Current overall Flow:
1. CoarseGrainedSchedulerBackend receives a signal to decommissionExecutor. On receiving the signal, it do 2 things - Stop assigning new tasks (SPARK-20628), Send another message to BlockManagerMasterEndpoint (via BlockManagerMaster) to decommission the corresponding BlockManager.
2. BlockManagerMasterEndpoint receives "DecommissionBlockManagers" message. On receiving this, it moves the corresponding block managers to "decommissioning" state. All decommissioning BMs are excluded from the getPeers RPC call which is used for replication. All these decommissioning BMs are also sent message from BlockManagerMasterEndpoint to start decommissioning process on themselves.
3. BlockManager on worker (say BM-x) receives the "DecommissionBlockManager" message. Now it will start BlockManagerDecommissionManager thread to offload all the RDD cached blocks. This thread can make multiple reattempts to decommission the existing cache blocks (multiple reattempts might be needed as there might not be sufficient space in other active BMs initially).
### Does this PR introduce any user-facing change?
NO
### How was this patch tested?
Added UTs.
Closes#28370 from prakharjain09/SPARK-20732-rddcache-1.
Authored-by: Prakhar Jain <prakharjain09@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
### What changes were proposed in this pull request?
This PR makes `test-dependencies.sh` detect the version string correctly by ignoring all the other lines.
### Why are the changes needed?
Currently, all SBT jobs are broken like the following.
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
```
[error] running /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh ; received return code 1
Build step 'Execute shell' marked build as failure
```
The reason is that the script detects the old version like `Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT` when `build/mvn` did fallback.
Specifically, in the script, `OLD_VERSION` became `Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT` instead of `3.1.0-SNAPSHOT` if build/mvn did fallback. Then, `pom.xml` file is corrupted like the following at the end and the exit code become `1` instead of `0`. It causes Jenkins jobs fails
```
- <version>3.1.0-SNAPSHOT</version>
+ <version>Falling</version>
```
**NO FALLBACK**
```
$ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec
Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
3.1.0-SNAPSHOT
```
**FALLBACK**
```
$ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec
Falling back to archive.apache.org to download Maven
Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
3.1.0-SNAPSHOT
```
**In the script**
```
$ echo $(build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec)
Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT
```
This PR will prevent irrelevant logs like `Falling back to archive.apache.org to download Maven`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the PR Builder.
Closes#28532 from dongjoon-hyun/SPARK-31713.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR removes `bootstrap-tooltip.js` which is no longer used.
That script is replaced with `bootstrap.bundle.min.js` in SPARK-30654 ( #27370 ).
### Why are the changes needed?
For cleaning up repository..
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manually checked whether tooltips are shown in the UI and no error message shown in the debug console.
Closes#28515 from sarutak/remove-tooltipjs.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR adds `i` option to ignore additional `build/mvn` output which is irrelevant to version string.
### Why are the changes needed?
SPARK-28963 added additional output message, `Falling back to archive.apache.org to download Maven` in build/mvn. This breaks `dev/create-release/release-build.sh` and currently Spark Packaging Jenkins job is hitting this issue consistently and broken.
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2912/console
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
This happens only when the mirror fails. So, this is verified manually hiject the script. It works like the following.
```
$ echo 'Falling back to archive.apache.org to download Maven' > out
$ build/mvn help:evaluate -Dexpression=project.version >> out
Using `mvn` from path: /Users/dongjoon/PRS/SPARK_RELEASE_2/build/apache-maven-3.6.3/bin/mvn
$ cat out | grep -v INFO | grep -v WARNING | grep -v Download
Falling back to archive.apache.org to download Maven
3.1.0-SNAPSHOT
$ cat out | grep -v INFO | grep -v WARNING | grep -vi Download
3.1.0-SNAPSHOT
```
Closes#28514 from dongjoon-hyun/SPARK_RELEASE_2.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
snappy-java have release v1.1.7.5, upgrade to latest version.
Fixed in v1.1.7.4
- Caching internal buffers for SnappyFramed streams #234
- Fixed the native lib for ppc64le to work with glibc 2.17 (Previously it depended on 2.22)
Fixed in v1.1.7.5
- Fixes java.lang.NoClassDefFoundError: org/xerial/snappy/pool/DefaultPoolFactory in 1.1.7.4
https://github.com/xerial/snappy-java/compare/1.1.7.3...1.1.7.5
v 1.1.7.5 release note:
edc4ec28bd
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
No need
Closes#28472 from AngersZhuuuu/spark-31655.
Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR aims to upgrade SLF4J from 1.7.16 to 1.7.30.
### Why are the changes needed?
SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run under Java 9. Also, this will bring all latest bug fixes.
- http://www.slf4j.org/news.html
> When running under Java 9, log4j version 1.2.x is unable to correctly parse the "java.version" system property. Assuming an inccorect Java version, it proceeded to disable its MDC functionality. The slf4j-log4j12 module shipping in this release fixes the issue by tweaking MDC internals by reflection, allowing log4j to run under Java 9. See also SLF4J-393.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the Jenkins with the existing tests.
Closes#28446 from dongjoon-hyun/SPARK-31633.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Some tidying of `sh` scripts in `R/`
### Why are the changes needed?
Not strictly needed, but the `'devtools' %in% installed.packages()` line in particular is "improper" / proabbly slow
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Not
Closes#28419 from MichaelChirico/r-scripts-cleanup.
Authored-by: Michael Chirico <michael.chirico@grabtaxi.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>