ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Sean Owen	40ef01283d	[SPARK-29802][BUILD] Use python3 in build scripts ### What changes were proposed in this pull request? Use `/usr/bin/env python3` consistently instead of `/usr/bin/env python` in build scripts, to reliably select Python 3. ### Why are the changes needed? Scripts no longer work with Python 2. ### Does this PR introduce _any_ user-facing change? No, should be all build system changes. ### How was this patch tested? Existing tests / NA Closes #29151 from srowen/SPARK-29909.2. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-19 11:02:37 +09:00
williamhyun	5daf244d0f	[SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES ### What changes were proposed in this pull request? This PR aims to rename `HADOOP2_MODULE_PROFILES` to `HADOOP_MODULE_PROFILES` because Hadoop 3 is now the default. ### Why are the changes needed? Hadoop 3 is now the default. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GitHub Action dependency test. Closes #29128 from williamhyun/williamhyun-patch-3. Authored-by: williamhyun <62487364+williamhyun@users.noreply.github.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-07-17 11:59:19 -05:00
HyukjinKwon	ea9e8f365a	[SPARK-32094][PYTHON] Update cloudpickle to v1.5.0 ### What changes were proposed in this pull request? This PR aims to upgrade PySpark's embedded cloudpickle to the latest cloudpickle v1.5.0 (See https://github.com/cloudpipe/cloudpickle/blob/v1.5.0/cloudpickle/cloudpickle.py) ### Why are the changes needed? There are many bug fixes. For example, the bug described in the JIRA: dill unpickling fails because they define `types.ClassType`, which is undefined in dill. This results in the following error: ``` Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 279, in loads return dill.loads(s) File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads return load(file, ignore) File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load obj = pik.load() File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 577, in _load_type return _reverse_typemap[name] KeyError: 'ClassType' ``` See also https://github.com/cloudpipe/cloudpickle/issues/82. This was fixed for cloudpickle 1.3.0+ (https://github.com/cloudpipe/cloudpickle/pull/337), but PySpark's cloudpickle.py doesn't have this change yet. More notably, now it supports C pickle implementation with Python 3.8 which hugely improve performance. This is already adopted in another project such as Ray. ### Does this PR introduce _any_ user-facing change? Yes, as described above, the bug fixes. Internally, users also could leverage the fast cloudpickle backed by C pickle. ### How was this patch tested? Jenkins will test it out. Closes #29114 from HyukjinKwon/SPARK-32094. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-17 11:49:18 +09:00
Erik Krogen	cf22d947fb	[SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature ### What changes were proposed in this pull request? This PR will remove references to these "blacklist" and "whitelist" terms besides the blacklisting feature as a whole, which can be handled in a separate JIRA/PR. This touches quite a few files, but the changes are straightforward (variable/method/etc. name changes) and most quite self-contained. ### Why are the changes needed? As per discussion on the Spark dev list, it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world. ### Does this PR introduce _any_ user-facing change? In the test file `HiveQueryFileTest`, a developer has the ability to specify the system property `spark.hive.whitelist` to specify a list of Hive query files that should be tested. This system property has been renamed to `spark.hive.includelist`. The old property has been kept for compatibility, but will log a warning if used. I am open to feedback from others on whether keeping a deprecated property here is unnecessary given that this is just for developers running tests. ### How was this patch tested? Existing tests should be suitable since no behavior changes are expected as a result of this PR. Closes #28874 from xkrogen/xkrogen-SPARK-32036-rename-blacklists. Authored-by: Erik Krogen <ekrogen@linkedin.com> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-07-15 11:40:55 -05:00
HyukjinKwon	902e1342a3	[SPARK-32303][PYTHON][TESTS] Remove leftover from editable mode installation in PIP test ### What changes were proposed in this pull request? Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading `6a4fb90cd2/py4j-0.10.9-py2.py3-none-any.whl` (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - https://github.com/apache/spark/pull/29099#issuecomment-658073453 (amp-jenkins-worker-04) - https://github.com/apache/spark/pull/29090#issuecomment-657819973 (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. ### Why are the changes needed? To recover the Jenkins build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Jenkins build will test it out. Closes #29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-14 16:43:16 -07:00
HyukjinKwon	4ad9bfd53b	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5 ### What changes were proposed in this pull request? This PR aims to drop Python 2.7, 3.4 and 3.5. Roughly speaking, it removes all the widely known Python 2 compatibility workarounds such as `sys.version` comparison, `__future__`. Also, it removes the Python 2 dedicated codes such as `ArrayConstructor` in Spark. ### Why are the changes needed? 1. Unsupport EOL Python versions 2. Reduce maintenance overhead and remove a bit of legacy codes and hacks for Python 2. 3. PyPy2 has a critical bug that causes a flaky test, SPARK-28358 given my testing and investigation. 4. Users can use Python type hints with Pandas UDFs without thinking about Python version 5. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle. ### Does this PR introduce _any_ user-facing change? Yes, users cannot use Python 2.7, 3.4 and 3.5 in the upcoming Spark version. ### How was this patch tested? Manually tested and also tested in Jenkins. Closes #28957 from HyukjinKwon/SPARK-32138. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-14 11:22:44 +09:00
Hyukjin Kwon	27ef3629dd	[SPARK-32292][SPARK-32252][INFRA] Run the relevant tests only in GitHub Actions ### What changes were proposed in this pull request? This PR mainly proposes to run only relevant tests just like Jenkins PR builder does. Currently, GitHub Actions always run full tests which wastes the resources. In addition, this PR also fixes 3 more issues very closely related together while I am here. 1. The main idea here is: It reuses the existing logic embedded in `dev/run-tests.py` which Jenkins PR builder use in order to run only the related test cases. 2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was because other references were not available when it is cloned via `checkoutv2`. With `fetch-depth: 0`, the history is available. 3. In addition, it fixes the `dev/run-tests.py` to match with `python/run-tests.py` in terms of its options. Environment variables such as `TEST_ONLY_XXX` were moved as proper options. For example, ```bash dev/run-tests.py --modules sql,core ``` which is consistent with `python/run-tests.py`, for example, ```bash python/run-tests.py --modules pyspark-core,pyspark-ml ``` 4. Lastly, also fixed the formatting issue in module specification in the matrix: ```diff - network_common, network_shuffle, repl, launcher + network-common, network-shuffle, repl, launcher, ``` which incorrectly runs build/test the modules. ### Why are the changes needed? By running only related tests, we can hugely save the resources and avoid unrelated flaky tests, etc. Also, now it runs the doctest of `dev/run-tests.py` properly, the usages are similar between `dev/run-tests.py` and `python/run-tests.py`, and run `network-common`, `network-shuffle`, `launcher` and `examples` modules too. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested in my own forked Spark: https://github.com/HyukjinKwon/spark/pull/7 https://github.com/HyukjinKwon/spark/pull/8 https://github.com/HyukjinKwon/spark/pull/9 https://github.com/HyukjinKwon/spark/pull/10 https://github.com/HyukjinKwon/spark/pull/11 https://github.com/HyukjinKwon/spark/pull/12 Closes #29086 from HyukjinKwon/SPARK-32292. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-13 08:31:39 -07:00
HyukjinKwon	b84ed4146d	[SPARK-32245][INFRA] Run Spark tests in Github Actions ### What changes were proposed in this pull request? This PR aims to run the Spark tests in Github Actions. To briefly explain the main idea: - Reuse `dev/run-tests.py` with SBT build - Reuse the modules in `dev/sparktestsupport/modules.py` to test each module - Pass the modules to test into `dev/run-tests.py` directly via `TEST_ONLY_MODULES` environment variable. For example, `pyspark-sql,core,sql,hive`. - `dev/run-tests.py` _does not_ take the dependent modules into account but solely the specified modules to test. Another thing to note might be `SlowHiveTest` annotation. Running the tests in Hive modules takes too much so the slow tests are extracted and it runs as a separate job. It was extracted from the actual elapsed time in Jenkins: ![Screen Shot 2020-07-09 at 7 48 13 PM](https://user-images.githubusercontent.com/6477701/87050238-f6098e80-c238-11ea-9c4a-ab505af61381.png) So, Hive tests are separated into to jobs. One is slow test cases, and the other one is the other test cases. _Note that_ the current GitHub Actions build virtually copies what the default PR builder on Jenkins does (without other profiles such as JDK 11, Hadoop 2, etc.). The only exception is Kinesis https://github.com/apache/spark/pull/29057/files#diff-04eb107ee163a50b61281ca08f4e4c7bR23 ### Why are the changes needed? Last week and onwards, the Jenkins machines became very unstable for many reasons: - Apparently, the machines became extremely slow. Almost all tests can't pass. - One machine (worker 4) started to have the corrupt `.m2` which fails the build. - Documentation build fails time to time for an unknown reason in Jenkins machine specifically. This is disabled for now at https://github.com/apache/spark/pull/29017. - Almost all PRs are basically blocked by this instability currently. The advantages of using Github Actions: - To avoid depending on few persons who can access to the cluster. - To reduce the elapsed time in the build - we could split the tests (e.g., SQL, ML, CORE), and run them in parallel so the total build time will significantly reduce. - To control the environment more flexibly. - Other contributors can test and propose to fix Github Actions configurations so we can distribute this build management cost. Note that: - The current build in Jenkins takes _more than 7 hours_. With Github actions it takes _less than 2 hours_ - We can now control the environments especially for Python easily. - The test and build look more stable than the Jenkins'. ### Does this PR introduce _any_ user-facing change? No, dev-only change. ### How was this patch tested? Tested at https://github.com/HyukjinKwon/spark/pull/4 Closes #29057 from HyukjinKwon/migrate-to-github-actions. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-11 13:09:06 -07:00
William Hyun	10a65ee9b4	[SPARK-32150][BUILD] Upgrade to ZStd 1.4.5-4 ### What changes were proposed in this pull request? This PR aims to upgrade to ZStd 1.4.5-4. ### Why are the changes needed? ZStd 1.4.5-4 fixes the following. - `3d16e51525` - `3d51bdcb82` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #28969 from williamhyun/zstd2. Authored-by: William Hyun <williamhyun3@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-12 00:45:48 +09:00
Dongjoon Hyun	c5bd0730a2	[SPARK-32231][R][INFRA] Use Hadoop 3.2 winutils in AppVeyor build ### What changes were proposed in this pull request? This PR proposes to use Hadoop 3 winutils to make AppVeyor builds pass. Currently it's being failed as below https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33976604 ### Why are the changes needed? To recover the build in AppVeyor. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? AppVeyor build will test it out. Closes #29042 from HyukjinKwon/SPARK-32231. Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-09 17:18:39 +09:00
Dongjoon Hyun	17997a5796	[SPARK-32233][TESTS] Disable SBT unidoc generation testing in Jenkins ### What changes were proposed in this pull request? This PR aims to disable SBT `unidoc` generation testing in Jenkins environment because it's flaky in Jenkins environment and not used for the official documentation generation. Also, GitHub Action has the correct test coverage for the official documentation generation. - https://github.com/apache/spark/pull/28848#issuecomment-654577911 (amp-jenkins-worker-06) - https://github.com/apache/spark/pull/28926#issuecomment-654316537 (amp-jenkins-worker-06) - https://github.com/apache/spark/pull/28969#issuecomment-654918636 (amp-jenkins-worker-06) - https://github.com/apache/spark/pull/28975#issuecomment-654447955 (amp-jenkins-worker-05) - https://github.com/apache/spark/pull/28986#issuecomment-654416543 (amp-jenkins-worker-05) - https://github.com/apache/spark/pull/28992#issuecomment-654371469 (amp-jenkins-worker-06) - https://github.com/apache/spark/pull/28993#issuecomment-655289237 (amp-jenkins-worker-05) - https://github.com/apache/spark/pull/28999#issuecomment-653976760 (amp-jenkins-worker-04) - https://github.com/apache/spark/pull/29010#issuecomment-655246083 (amp-jenkins-worker-03) - https://github.com/apache/spark/pull/29013#issuecomment-654292483 (amp-jenkins-worker-04) - https://github.com/apache/spark/pull/29016#issuecomment-654495070 (amp-jenkins-worker-05) - https://github.com/apache/spark/pull/29025#issuecomment-654889938 (amp-jenkins-worker-04) - https://github.com/apache/spark/pull/29042#issuecomment-655587989 (amp-jenkins-worker-03) ### Why are the changes needed? Apache Spark `release-build.sh` generates the official document by using the following command. - https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh#L341 ```bash PRODUCTION=1 RELEASE_VERSION="$SPARK_VERSION" jekyll build ``` And, this is executed by the following `unidoc` command for Scala/Java API doc. - https://github.com/apache/spark/blob/master/docs/_plugins/copy_api_dirs.rb#L30 ```ruby system("build/sbt -Pkinesis-asl clean compile unidoc") \|\| raise("Unidoc generation failed") ``` However, the PR builder disabled `Jekyll build` and instead has a different test coverage. ```python # determine if docs were changed and if we're inside the amplab environment # note - the below commented out until all Jenkins workers can get `jekyll` installed # if "DOCS" in changed_modules and test_env == "amplab_jenkins": # build_spark_documentation() ``` ``` Building Unidoc API Documentation ======================================================================== [info] Building Spark unidoc using SBT with these arguments: -Phadoop-3.2 -Phive-2.3 -Pspark-ganglia-lgpl -Pkubernetes -Pmesos -Phadoop-cloud -Phive -Phive-thriftserver -Pkinesis-asl -Pyarn unidoc ``` ### Does this PR introduce _any_ user-facing change? No. (This is used only for testing and not used in the official doc generation.) ### How was this patch tested? Pass the Jenkins without doc generation invocation. Closes #29017 from dongjoon-hyun/SPARK-DOC-GEN. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-07-08 14:11:18 -07:00
Dongjoon Hyun	0e33b5ecde	[SPARK-32178][TESTS] Disable test-dependencies.sh from Jenkins jobs ### What changes were proposed in this pull request? This PR aims to disable dependency tests(test-dependencies.sh) from Jenkins. ### Why are the changes needed? - First of all, GitHub Action provides the same test capability already and stabler. - Second, currently, `test-dependencies.sh` fails very frequently in AmpLab Jenkins environment. For example, in the following irrelevant PR, it fails 5 times during 6 hours. - https://github.com/apache/spark/pull/29001 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins without `test-dependencies.sh` invocation. Closes #29004 from dongjoon-hyun/SPARK-32178. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-07-06 12:03:08 +09:00
Wenchen Fan	a6b6a1fd61	[MINOR] update dev/create-release/known_translations This is a followup of https://github.com/apache/spark/pull/28861 : 1. sort the names by Github ID, suggested by https://github.com/apache/spark/pull/28861#pullrequestreview-433369190 2. add more full names collected in https://github.com/apache/spark/pull/28861 3. The duplicated entry of `linbojin` is removed. 4. Name format is normalized to First Last name style. Closes #28891 from cloud-fan/update. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-29 11:53:56 +00:00
Dongjoon Hyun	9c134b57bf	[SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default ### What changes were proposed in this pull request? According to the dev mailing list discussion, this PR aims to switch the default Apache Hadoop dependency from 2.7.4 to 3.2.0 for Apache Spark 3.1.0 on December 2020. \| Item \| Default Hadoop Dependency \| \|------\|-----------------------------\| \| Apache Spark Website \| 3.2.0 \| \| Apache Download Site \| 3.2.0 \| \| Apache Snapshot \| 3.2.0 \| \| Maven Central \| 3.2.0 \| \| PyPI \| 2.7.4 (We will switch later) \| \| CRAN \| 2.7.4 (We will switch later) \| \| Homebrew \| 3.2.0 (already) \| In Apache Spark 3.0.0 release, we focused on the other features. This PR targets for [Apache Spark 3.1.0 scheduled on December 2020](https://spark.apache.org/versioning-policy.html). ### Why are the changes needed? Apache Hadoop 3.2 has many fixes and new cloud-friendly features. Reference - 2017-08-04: https://hadoop.apache.org/release/2.7.4.html - 2019-01-16: https://hadoop.apache.org/release/3.2.0.html ### Does this PR introduce _any_ user-facing change? Since the default Hadoop dependency changes, the users will get a better support in a cloud environment. ### How was this patch tested? Pass the Jenkins. Closes #28897 from dongjoon-hyun/SPARK-32058. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-26 19:43:29 -07:00
HyukjinKwon	71b6d462fb	[SPARK-32089][R][BUILD] Upgrade R version to 4.0.2 in the release DockerFiile ### What changes were proposed in this pull request? This PR proposes to upgrade R version to 4.0.2 in the release docker image. As of SPARK-31918, we should make a release with R 4.0.0+ which works with R 3.5+ too. ### Why are the changes needed? To unblock releases on CRAN. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested via scripts under `dev/create-release`, manually attaching to the container and checking the R version. Closes #28922 from HyukjinKwon/SPARK-32089. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-24 14:56:55 +00:00
HyukjinKwon	e29ec42879	[SPARK-32074][BUILD][R] Update AppVeyor R version to 4.0.2 ### What changes were proposed in this pull request? R version 4.0.2 was released, see https://cran.r-project.org/doc/manuals/r-release/NEWS.html. This PR targets to upgrade R version in AppVeyor CI environment. ### Why are the changes needed? To test the latest R versions before the release, and see if there are any regressions. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? AppVeyor will test. Closes #28909 from HyukjinKwon/SPARK-32074. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-06-24 15:37:41 +09:00
William Hyun	2bcbe3dd9a	[SPARK-32045][BUILD] Upgrade to Apache Commons Lang 3.10 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache Commons Lang 3.10. ### Why are the changes needed? This will bring the latest bug fixes like [LANG-1453](https://issues.apache.org/jira/browse/LANG-1453). https://commons.apache.org/proper/commons-lang/release-notes/RELEASE-NOTES-3.10.txt ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #28889 from williamhyun/commons-lang-3.10. Authored-by: William Hyun <williamhyun3@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-06-23 16:59:55 +09:00
Wenchen Fan	8a9ae01e74	[MINOR] update dev/create-release/known_translations This is auto-updated by running script `translate-contributors.py` Closes #28861 from cloud-fan/update. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-06-18 15:32:44 +00:00
DB Tsai	9b792518b2	[SPARK-31960][YARN][BUILD] Only populate Hadoop classpath for no-hadoop build ### What changes were proposed in this pull request? If a Spark distribution has built-in hadoop runtime, Spark will not populate the hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath` when a job is submitted to Yarn. Users can override this behavior by setting `spark.yarn.populateHadoopClasspath` to `true`. ### Why are the changes needed? Without this, Spark will populate the hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath` even Spark distribution has built-in hadoop. This results jar conflict and many unexpected behaviors in runtime. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually test with two builds, with-hadoop and no-hadoop builds. Closes #28788 from dbtsai/yarn-classpath. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2020-06-18 06:08:40 +00:00
Gengliang Wang	f535004e14	[SPARK-31967][UI] Downgrade to vis.js 4.21.0 to fix Jobs UI loading time regression ### What changes were proposed in this pull request? After #28192, the job list page becomes very slow. For example, after the following operation, the UI loading can take >40 sec. ``` (1 to 1000).foreach(_ => sc.parallelize(1 to 10).collect) ``` This is caused by a [performance issue of `vis-timeline`](https://github.com/visjs/vis-timeline/issues/379). The serious issue affects both branch-3.0 and branch-2.4 I tried a different version 4.21.0 from https://cdnjs.com/libraries/vis The infinite drawing issue seems also fixed if the zoom is disabled as default. ### Why are the changes needed? Fix the serious perf issue in web UI by falling back vis-timeline-graph2d to an ealier version. ### Does this PR introduce _any_ user-facing change? Yes, fix the UI perf regression ### How was this patch tested? Manual test Closes #28806 from gengliangwang/downgradeVis. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-06-12 17:22:41 -07:00
Wenchen Fan	28f131fc8a	[SPARK-31979] Release script should not fail when remove non-existing files ### What changes were proposed in this pull request? When removing non-existing files in the release script, do not fail. ### Why are the changes needed? This is to make the release script more robust, as we don't care if the files exist before we remove them. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? tested when cutting 3.0.0 RC Closes #28815 from cloud-fan/release. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-06-12 11:06:52 -07:00
Wenchen Fan	d3a5e2963c	Revert "[SPARK-31860][BUILD] only push release tags on succes" This reverts commit `69ba9b662e`.	2020-06-12 17:50:43 +08:00
William Hyun	367d94a30d	[SPARK-31876][BUILD] Upgrade to Zstd 1.4.5 ### What changes were proposed in this pull request? This PR aims to upgrade to Zstd 1.4.5. ### Why are the changes needed? Zstd 1.4.5 improves performance. https://github.com/facebook/zstd/releases/tag/v1.4.5 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed the Jenkins. Closes #28682 from williamhyun/zstd. Authored-by: William Hyun <williamhyun3@gmail.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2020-06-02 21:58:33 +00:00
Holden Karau	69ba9b662e	[SPARK-31860][BUILD] only push release tags on succes ### What changes were proposed in this pull request? Only push the release tag after the build has finished. ### Why are the changes needed? If the build fails we don't need a release tag. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Running locally with a fake user in https://github.com/apache/spark/pull/28667 Closes #28700 from holdenk/SPARK-31860-build-master-only-push-tags-on-success. Authored-by: Holden Karau <hkarau@apple.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2020-06-02 11:04:07 -07:00
Holden Karau	ab9e5a2fe9	[SPARK-31889][BUILD] Docker release script does not allocate enough memory to reliably publish ### What changes were proposed in this pull request? Allow overriding the zinc options in the docker release and set a higher so the publish step can succeed consistently. ### Why are the changes needed? The publish step experiences memory pressure. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Running test locally with fake user to see if publish step (besides svn part) succeeds Closes #28698 from holdenk/SPARK-31889-docker-release-script-does-not-allocate-enough-memory-to-reliably-publish. Authored-by: Holden Karau <hkarau@apple.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2020-06-01 15:49:17 -07:00
Dongjoon Hyun	625abca9db	[SPARK-31858][BUILD] Upgrade commons-io to 2.5 in Hadoop 3.2 profile ### What changes were proposed in this pull request? This PR aims to upgrade `commons-io` from 2.4 to 2.5 for Apache Spark 3.1. ### Why are the changes needed? Since Hadoop 3.1, `commons-io` 2.5 is used. - https://issues.apache.org/jira/browse/HADOOP-15261 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with Hadoop-3.2 profile. Maven dependency is verified via `test-dependencies.sh` automatically. SBT dependency can be verified like the following manually. ``` build/sbt -Phadoop-3.2 "core/dependencyTree" \| grep commons-io:commons-io \| head -n1 [info] \| \| +-commons-io:commons-io:2.5 ``` Closes #28665 from dongjoon-hyun/SPARK-31858. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-29 07:46:53 -07:00
Jungtaek Lim (HeartSaVioR)	fe1d1e24bc	[SPARK-31214][BUILD] Upgrade Janino to 3.1.2 ### What changes were proposed in this pull request? This PR proposes to upgrade Janino to 3.1.2 which is released recently. Major changes were done for refactoring, as well as there're lots of "no commit message". Belows are the pairs of (commit title, commit) which seem to deal with some bugs or specific improvements (not purposed to refactor) after 3.0.15. * Issue #119: Guarantee executing popOperand() in popUninitializedVariableOperand() via moving popOperand() out of "assert" * Issue #116: replace operand to final target type if boxing conversion and widening reference conversion happen together * Merged pull request `#114` "Grow the code for relocatables, and do fixup, and relocate". * `367c58e73e` * issue `#107`: Janino requires "org.codehaus.commons.compiler.io", but commons-compiler does not export this package * `f7d99596d4` * Throw an NYI CompileException when a static interface method is invoked. * `efd3884983` * Fixed the promotion of the array access index expression (see JLS7 15.13 Array Access Expressions) * `32fdb5f5f1` * Issue `#104`: ClassLoaderIClassLoader 's ClassNotFoundException handle mechanism enhancement * `6e8a97d609` You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html ### Why are the changes needed? We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: https://github.com/janino-compiler/janino/issues/113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details. Janino 3.1.1 contains the PR https://github.com/janino-compiler/janino/pull/114 which would enable Janino to succeed to compile user's query properly. I've also fixed a couple of more bugs as 3.1.1 made Spark UTs fail - hence we need to upgrade to 3.1.2. Furthermore, from my testing, https://github.com/janino-compiler/janino/issues/90 (which Josh Rosen filed before) seems to be also resolved in 3.1.2 as well. Looks like Janino is maintained by one person and there's no even version branches and releases/tags so we can't expect Janino maintainer to release a new bugfix version - hence have to try out new minor version. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #27860 from HeartSaVioR/SPARK-31101. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-29 07:42:57 -07:00
shane knapp	9e68affd13	[BUILD][INFRA] bump the timeout to match the jenkins PRB ### What changes were proposed in this pull request? bump the timeout to match what's set in jenkins ### Why are the changes needed? tests be timing out! ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? via jenkins Closes #28666 from shaneknapp/increase-jenkins-timeout. Authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: shane knapp <incomplete@gmail.com>	2020-05-28 14:25:49 -07:00
Gabor Somogyi	8f1d77488c	[SPARK-31821][BUILD] Remove mssql-jdbc dependencies from Hadoop 3.2 profile ### What changes were proposed in this pull request? There is an unnecessary dependency for `mssql-jdbc`. In this PR I've removed it. ### Why are the changes needed? Unnecessary dependency. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with the following configuration. - [x] Pass the dependency test. - [x] SBT with Hadoop-3.2 (https://github.com/apache/spark/pull/28640#issuecomment-634192512) - [ ] Maven with Hadoop-3.2 Closes #28640 from gaborgsomogyi/SPARK-31821. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-26 18:21:22 -07:00
William Hyun	753636e86b	[SPARK-31807][INFRA] Use python 3 style in release-build.sh ### What changes were proposed in this pull request? This PR aims to use the style that is compatible with both python 2 and 3. ### Why are the changes needed? This will help python 3 migration. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual. Closes #28632 from williamhyun/use_python3_style. Authored-by: William Hyun <williamhyun3@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-05-25 10:25:43 +09:00
Kousuke Saruta	8441e936fc	Revert "[SPARK-31756][WEBUI] Add real headless browser support for UI test This reverts commit `d95570864a`. Closes #28624 from sarutak/revert-real-headless-browser-support. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>	2020-05-24 09:13:38 +09:00
Dongjoon Hyun	64ffc66496	[SPARK-31786][K8S][BUILD] Upgrade kubernetes-client to 4.9.2 ### What changes were proposed in this pull request? This PR aims to upgrade `kubernetes-client` library to bring the JDK8 related fixes. Please note that JDK11 works fine without any problem. - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.9.2 - JDK8 always uses http/1.1 protocol (Prevent OkHttp from wrongly enabling http/2) ### Why are the changes needed? OkHttp "wrongly" detects the Platform as Jdk9Platform on JDK 8u251. - https://github.com/fabric8io/kubernetes-client/issues/2212 - https://stackoverflow.com/questions/61565751/why-am-i-not-able-to-run-sparkpi-example-on-a-kubernetes-k8s-cluster Although there is a workaround `export HTTP2_DISABLE=true` and `Downgrade JDK or K8s`, we had better avoid this problematic situation. ### Does this PR introduce _any_ user-facing change? No. This will recover the failures on JDK 8u252. ### How was this patch tested? - [x] Pass the Jenkins UT (https://github.com/apache/spark/pull/28601#issuecomment-632474270) - [x] Pass the Jenkins K8S IT with the K8s 1.13 (https://github.com/apache/spark/pull/28601#issuecomment-632438452) - [x] Manual testing with K8s 1.17.3. (Below) v1.17.6 result (on Minikube) ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - Test basic decommissioning Run completed in 8 minutes, 27 seconds. Total number of tests run: 19 Suites: completed 2, aborted 0 Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Closes #28601 from dongjoon-hyun/SPARK-K8S-CLIENT. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-23 11:07:45 -07:00
Kousuke Saruta	d95570864a	[SPARK-31756][WEBUI] Add real headless browser support for UI test ### What changes were proposed in this pull request? This PR mainly adds two things. 1. Real headless browser support for UI test 2. A test suite using headless Chrome as one instance of those browsers. Also, for environment where Chrome and Chrome driver is not installed, `ChromeUITest` tag is added to filter out the test suite. ### Why are the changes needed? In the current master, there are two problems for UI test. 1. Lots of tests especially JavaScript related ones are done manually. Appearance is better to be confirmed by our eyes but logic should be tested by test cases ideally. 2. Compared to the real web browsers, HtmlUnit doesn't seem to support JavaScript enough. I added a JavaScript related test before for SPARK-31534 using HtmlUnit which is simple library based headless browser for test. The test I added works somehow but some JavaScript related error is shown in unit-tests.log. ``` ======= EXCEPTION START ======== Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException] com.gargoylesoftware.htmlunit.ScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:904) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:835) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:807) at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:52) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:102) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:426) at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:157) at java.lang.Thread.run(Thread.java:748) Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1009) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:413) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3264) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:828) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:889) ... 10 more JavaScriptException value = Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null\|function)". == CALLING JAVASCRIPT == function () { throw e; } ======= EXCEPTION END ======== ``` I tried to upgrade HtmlUnit to 2.40.0 but what is worse, the test become not working even though it works on real browsers like Chrome, Safari and Firefox without error. ``` [info] UISeleniumSuite: [info] - SPARK-31534: text for tooltip should be escaped * FAILED * (17 seconds, 745 milliseconds) [info] The code passed to eventually never returned normally. Attempted 2 times over 12.910785232 seconds. Last failure message: com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (http://192.168.1.209:62132/static/vis-timeline-graph2d.min.js#52(Function)#1) ``` To resolve those problems, it's better to support headless browser for UI test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I tested with following patterns. Both Chrome and Chrome driver should be installed to test. 1. sbt / with chromedriver / include tag (expect to succeed) `build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"` 2. sbt / with chromedriver / exclude tag (expect to be ignored) `build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite -l org.apache.spark.tags.ChromeUITest"` 3. sbt / without chromedriver / include tag (expect to be failed) `build/sbt "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"` 4. sbt / without chromedriver / exclude tag (expect to be skipped) `build/sbt -Dtest.exclude.tags=org.apache.spark.tags.ChromeUITest "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"` 5. Maven / wth chromedriver / include tag (expect to succeed) `build/mvn -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` 6. Maven / with chromedriver / exclude tag (expect to be skipped) `build/mvn -Dtest.exclude.tags="org.apache.spark.tags.ChromeUITest" -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` 7. Maven / without chromedriver / include tag (expect to be failed) `build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` 8. Maven / without chromedriver / exclude tag (expect to be skipped) `build/mvn -Dtest.exclude.tags=org.apache.spark.tags.ChromeUITest -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` Closes #28578 from sarutak/real-headless-browser-support. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-05-22 08:24:31 -05:00
Thomas Graves	b64688ebba	[SPARK-29303][WEB UI] Add UI support for stage level scheduling ### What changes were proposed in this pull request? This adds UI updates to support stage level scheduling and ResourceProfiles. 3 main things have been added. ResourceProfile id added to the Stage page, the Executors page now has an optional selectable column to show the ResourceProfile Id of each executor, and the Environment page now has a section with the ResourceProfile ids. Along with this the rest api for environment page was updated to include the Resource profile information. I debating on splitting the resource profile information into its own page but I wasn't sure it called for a completely separate page. Open to peoples thoughts on this. Screen shots: ![Screen Shot 2020-04-01 at 3 07 46 PM](https://user-images.githubusercontent.com/4563792/78185169-469a7000-7430-11ea-8b0c-d9ede2d41df8.png) ![Screen Shot 2020-04-01 at 3 08 14 PM](https://user-images.githubusercontent.com/4563792/78185175-48fcca00-7430-11ea-8d1d-6b9333700f32.png) ![Screen Shot 2020-04-01 at 3 09 03 PM](https://user-images.githubusercontent.com/4563792/78185176-4a2df700-7430-11ea-92d9-73c382bb0f32.png) ![Screen Shot 2020-04-01 at 11 05 48 AM](https://user-images.githubusercontent.com/4563792/78185186-4dc17e00-7430-11ea-8962-f749dd47ea60.png) ### Why are the changes needed? For user to be able to know what resource profile was used with which stage and executors. The resource profile information is also available so user debugging can see exactly what resources were requested with that profile. ### Does this PR introduce any user-facing change? Yes, UI updates. ### How was this patch tested? Unit tests and tested on yarn both active applications and with the history server. Closes #28094 from tgravescs/SPARK-29303-pr. Lead-authored-by: Thomas Graves <tgraves@nvidia.com> Co-authored-by: Thomas Graves <tgraves@apache.org> Signed-off-by: Thomas Graves <tgraves@apache.org>	2020-05-21 13:11:35 -05:00
Prakhar Jain	c560428fe0	[SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned ### What changes were proposed in this pull request? After changes in SPARK-20628, CoarseGrainedSchedulerBackend can decommission an executor and stop assigning new tasks on it. We should also decommission the corresponding blockmanagers in the same way. i.e. Move the cached RDD blocks from those executors to other active executors. ### Why are the changes needed? We need to gracefully decommission the block managers so that the underlying RDD cache blocks are not lost in case the executors are taken away forcefully after some timeout (because of spotloss/pre-emptible VM etc). Its good to save as much cache data as possible. Also In future once the decommissioning signal comes from Cluster Manager (say YARN/Mesos etc), dynamic allocation + this change gives us opportunity to downscale the executors faster by making the executors free of cache data. Note that this is a best effort approach. We try to move cache blocks from decommissioning executors to active executors. If the active executors don't have free resources available on them for caching, then the decommissioning executors will keep the cache block which it was not able to move and it will still be able to serve them. Current overall Flow: 1. CoarseGrainedSchedulerBackend receives a signal to decommissionExecutor. On receiving the signal, it do 2 things - Stop assigning new tasks (SPARK-20628), Send another message to BlockManagerMasterEndpoint (via BlockManagerMaster) to decommission the corresponding BlockManager. 2. BlockManagerMasterEndpoint receives "DecommissionBlockManagers" message. On receiving this, it moves the corresponding block managers to "decommissioning" state. All decommissioning BMs are excluded from the getPeers RPC call which is used for replication. All these decommissioning BMs are also sent message from BlockManagerMasterEndpoint to start decommissioning process on themselves. 3. BlockManager on worker (say BM-x) receives the "DecommissionBlockManager" message. Now it will start BlockManagerDecommissionManager thread to offload all the RDD cached blocks. This thread can make multiple reattempts to decommission the existing cache blocks (multiple reattempts might be needed as there might not be sufficient space in other active BMs initially). ### Does this PR introduce any user-facing change? NO ### How was this patch tested? Added UTs. Closes #28370 from prakharjain09/SPARK-20732-rddcache-1. Authored-by: Prakhar Jain <prakharjain09@gmail.com> Signed-off-by: Holden Karau <hkarau@apple.com>	2020-05-18 11:37:53 -07:00
Dongjoon Hyun	cd5fbcf9a0	[SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly ### What changes were proposed in this pull request? This PR makes `test-dependencies.sh` detect the version string correctly by ignoring all the other lines. ### Why are the changes needed? Currently, all SBT jobs are broken like the following. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console ``` [error] running /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh ; received return code 1 Build step 'Execute shell' marked build as failure ``` The reason is that the script detects the old version like `Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT` when `build/mvn` did fallback. Specifically, in the script, `OLD_VERSION` became `Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT` instead of `3.1.0-SNAPSHOT` if build/mvn did fallback. Then, `pom.xml` file is corrupted like the following at the end and the exit code become `1` instead of `0`. It causes Jenkins jobs fails ``` - <version>3.1.0-SNAPSHOT</version> + <version>Falling</version> ``` NO FALLBACK ``` $ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn 3.1.0-SNAPSHOT ``` FALLBACK ``` $ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec Falling back to archive.apache.org to download Maven Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn 3.1.0-SNAPSHOT ``` In the script ``` $ echo $(build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec) Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT ``` This PR will prevent irrelevant logs like `Falling back to archive.apache.org to download Maven`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the PR Builder. Closes #28532 from dongjoon-hyun/SPARK-31713. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-14 19:28:25 -07:00
Kousuke Saruta	38bc45b0b5	[SPARK-30654][WEBUI][FOLLOWUP] Remove bootstrap-tooltip.js which is no longer used ### What changes were proposed in this pull request? This PR removes `bootstrap-tooltip.js` which is no longer used. That script is replaced with `bootstrap.bundle.min.js` in SPARK-30654 ( #27370 ). ### Why are the changes needed? For cleaning up repository.. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually checked whether tooltips are shown in the UI and no error message shown in the debug console. Closes #28515 from sarutak/remove-tooltipjs. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-13 01:04:57 -07:00
Dongjoon Hyun	3772154442	[SPARK-31691][INFRA] release-build.sh should ignore a fallback output from `build/mvn` ### What changes were proposed in this pull request? This PR adds `i` option to ignore additional `build/mvn` output which is irrelevant to version string. ### Why are the changes needed? SPARK-28963 added additional output message, `Falling back to archive.apache.org to download Maven` in build/mvn. This breaks `dev/create-release/release-build.sh` and currently Spark Packaging Jenkins job is hitting this issue consistently and broken. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2912/console ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This happens only when the mirror fails. So, this is verified manually hiject the script. It works like the following. ``` $ echo 'Falling back to archive.apache.org to download Maven' > out $ build/mvn help:evaluate -Dexpression=project.version >> out Using `mvn` from path: /Users/dongjoon/PRS/SPARK_RELEASE_2/build/apache-maven-3.6.3/bin/mvn $ cat out \| grep -v INFO \| grep -v WARNING \| grep -v Download Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT $ cat out \| grep -v INFO \| grep -v WARNING \| grep -vi Download 3.1.0-SNAPSHOT ``` Closes #28514 from dongjoon-hyun/SPARK_RELEASE_2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-12 14:24:56 -07:00
Dongjoon Hyun	0feb3cbe77	[SPARK-31687][INFRA] Use GitHub instead of GitBox in release script ### What changes were proposed in this pull request? This PR aims to use GitHub urls instead of GitHub in the release scripts. ### Why are the changes needed? Currently, Spark Packaing jobs are broken due to GitBox issue. ``` fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': Failed to connect to gitbox.apache.org port 443: Connection timed out ``` - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console ### Does this PR introduce _any_ user-facing change? No. (This is a dev-only script.) ### How was this patch tested? Manual. ``` $ cat ./test.sh . dev/create-release/release-util.sh get_release_info git clone "$ASF_REPO" $ sh test.sh Branch [branch-3.0]: Current branch version is 3.0.1-SNAPSHOT. Release [3.0.0]: RC # [2]: Full name [Dongjoon Hyun]: GPG key [dongjoonapache.org]: ================ Release details: BRANCH: branch-3.0 VERSION: 3.0.0 TAG: v3.0.0-rc2 NEXT: 3.0.1-SNAPSHOT ASF USER: dongjoon GPG KEY: dongjoonapache.org FULL NAME: Dongjoon Hyun E-MAIL: dongjoonapache.org ================ Is this info correct [y/n]? y ASF password: GPG passphrase: Cloning into 'spark'... remote: Enumerating objects: 223, done. remote: Counting objects: 100% (223/223), done. remote: Compressing objects: 100% (117/117), done. remote: Total 708324 (delta 70), reused 138 (delta 32), pack-reused 708101 Receiving objects: 100% (708324/708324), 322.08 MiB \| 2.94 MiB/s, done. Resolving deltas: 100% (289268/289268), done. Updating files: 100% (16287/16287), done. $ sh test.sh ... ``` Closes #28513 from dongjoon-hyun/SPARK-31687. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-12 13:07:00 -07:00
angerszhu	0d9faf602e	[SPARK-31655][BUILD] Upgrade snappy-java to 1.1.7.5 ### What changes were proposed in this pull request? snappy-java have release v1.1.7.5, upgrade to latest version. Fixed in v1.1.7.4 - Caching internal buffers for SnappyFramed streams #234 - Fixed the native lib for ppc64le to work with glibc 2.17 (Previously it depended on 2.22) Fixed in v1.1.7.5 - Fixes java.lang.NoClassDefFoundError: org/xerial/snappy/pool/DefaultPoolFactory in 1.1.7.4 https://github.com/xerial/snappy-java/compare/1.1.7.3...1.1.7.5 v 1.1.7.5 release note: `edc4ec28bd` ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No need Closes #28472 from AngersZhuuuu/spark-31655. Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-07 12:01:43 -07:00
Dongjoon Hyun	e7995c2ddc	[SPARK-31633][BUILD] Upgrade SLF4J from 1.7.16 to 1.7.30 ### What changes were proposed in this pull request? This PR aims to upgrade SLF4J from 1.7.16 to 1.7.30. ### Why are the changes needed? SLF4J 1.7.23+ is required to enable `slf4j-log4j12` with MDC feature to run under Java 9. Also, this will bring all latest bug fixes. - http://www.slf4j.org/news.html > When running under Java 9, log4j version 1.2.x is unable to correctly parse the "java.version" system property. Assuming an inccorect Java version, it proceeded to disable its MDC functionality. The slf4j-log4j12 module shipping in this release fixes the issue by tweaking MDC internals by reflection, allowing log4j to run under Java 9. See also SLF4J-393. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #28446 from dongjoon-hyun/SPARK-31633. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-04 08:14:12 -07:00
Michael Chirico	c00fe5ef3e	[MINOR][R] small tidying of sh scripts for R ### What changes were proposed in this pull request? Some tidying of `sh` scripts in `R/` ### Why are the changes needed? Not strictly needed, but the `'devtools' %in% installed.packages()` line in particular is "improper" / proabbly slow ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not Closes #28419 from MichaelChirico/r-scripts-cleanup. Authored-by: Michael Chirico <michael.chirico@grabtaxi.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-30 16:58:05 -07:00
Dongjoon Hyun	62be65efe4	[SPARK-31567][R][TESTS] Update AppVeyor Rtools to 4.0.0 ### What changes were proposed in this pull request? This aims to upgrade Rtools first to prepare R 4.0.0 in AppVeyor for Apache Spark 3.1.0. ### Why are the changes needed? R 4.0.0 is released on April 24th, 2020. It uses Rtools 4.0.0 officially. - https://cran.r-project.org/doc/manuals/r-release/NEWS.html - https://stat.ethz.ch/pipermail/r-announce/2020/000653.html ### Does this PR introduce any user-facing change? No. (This PR aims to test Rtools 4.0.0 in AppVeyor environment.) ### How was this patch tested? See the AppVeyor result. Closes #28358 from dongjoon-hyun/SPARK-31567. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-29 13:10:43 +09:00
Dongjoon Hyun	79eaaaf6da	[SPARK-31580][BUILD] Upgrade Apache ORC to 1.5.10 ### What changes were proposed in this pull request? This PR aims to upgrade Apache ORC to 1.5.10. ### Why are the changes needed? Apache ORC 1.5.10 is a maintenance release with the following patches. - [ORC-621](https://issues.apache.org/jira/browse/ORC-621) Need reader fix for ORC-569 - [ORC-616](https://issues.apache.org/jira/browse/ORC-616) In Patched Base encoding, the value of headerThirdByte goes beyond the range of byte - [ORC-613](https://issues.apache.org/jira/browse/ORC-613) OrcMapredRecordReader mis-reuse struct object when actual children schema differs - [ORC-610](https://issues.apache.org/jira/browse/ORC-610) Updated Copyright year in the NOTICE file The following is release note. - https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320&version=12346912 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing ORC tests and a newly added test case. - The first commit is already tested in `hive-2.3` profile with both native ORC implementation and Hive 2.3 ORC implementation. (https://github.com/apache/spark/pull/28373#issuecomment-620265114) - The latest run is about to make the test case disable in `hive-1.2` profile which doesn't use Apache ORC. - `hive-1.2`: https://github.com/apache/spark/pull/28373#issuecomment-620325906 Closes #28373 from dongjoon-hyun/SPARK-ORC-1.5.10. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-27 18:56:30 -07:00
Thomas Graves	95aec091e4	[SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests ### What changes were proposed in this pull request? As part of the Stage level scheduling features, add the Python api's to set resource profiles. This also adds the functionality to properly apply the pyspark memory configuration when specified in the ResourceProfile. The pyspark memory configuration is being passed in the task local properties. This was an easy way to get it to the PythonRunner that needs it. I modeled this off how the barrier task scheduling is passing the addresses. As part of this I added in the JavaRDD api's because those are needed by python. ### Why are the changes needed? python api for this feature ### Does this PR introduce any user-facing change? Yes adds the java and python apis for user to specify a ResourceProfile to use stage level scheduling. ### How was this patch tested? unit tests and manually tested on yarn. Tests also run to verify it errors properly on standalone and local mode where its not yet supported. Closes #28085 from tgravescs/SPARK-29641-pr-base. Lead-authored-by: Thomas Graves <tgraves@nvidia.com> Co-authored-by: Thomas Graves <tgraves@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-23 10:20:39 +09:00
Yuming Wang	b11e42663b	[SPARK-31381][SPARK-29245][SQL] Upgrade built-in Hive 2.3.6 to 2.3.7 ### What changes were proposed in this pull request? Hive 2.3.7 fixed these issues: - HIVE-21508: ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer - HIVE-21980:Parsing time can be high in case of deeply nested subqueries - HIVE-22249: Support Parquet through HCatalog ### Why are the changes needed? Fix CCE during creating HiveMetaStoreClient in JDK11 environment: [SPARK-29245](https://issues.apache.org/jira/browse/SPARK-29245). ### Does this PR introduce any user-facing change? No. ### How was this patch tested? - [x] Test Jenkins with Hadoop 2.7 (https://github.com/apache/spark/pull/28148#issuecomment-616757840) - [x] Test Jenkins with Hadoop 3.2 on JDK11 (https://github.com/apache/spark/pull/28148#issuecomment-616294353) - [x] Manual test with remote hive metastore. Hive side: ``` export JAVA_HOME=/usr/lib/jdk1.8.0_221 export PATH=$JAVA_HOME/bin:$PATH cd /usr/lib/hive-2.3.6 # Start Hive metastore with Hive 2.3.6 bin/schematool -dbType derby -initSchema --verbose bin/hive --service metastore ``` Spark side: ``` export JAVA_HOME=/usr/lib/jdk-11.0.3 export PATH=$JAVA_HOME/bin:$PATH build/sbt clean package -Phive -Phadoop-3.2 -Phive-thriftserver export SPARK_PREPEND_CLASSES=true bin/spark-sql --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083 ``` Closes #28148 from wangyum/SPARK-31381. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-20 13:38:24 -07:00
Kousuke Saruta	2921be7a4e	[SPARK-31462][INFRA] The usage of getopts and case statement is wrong in do-release.sh ### What changes were proposed in this pull request? This PR (SPARK-31462) fixes the usage of getopts and case statement in `do-release.sh` and `do-release-docker.sh`. ### Why are the changes needed? In the current master, do-release.sh contains the following code. ``` while getopts "bn" opt; do case $opt in b) GIT_BRANCH=$OPTARG ;; n) DRY_RUN=1 ;; ?) error "Invalid option: $OPTARG" ;; esac done ``` There are 3 wrong usage in getopts and case statement. 1. To set $OPTARG to an argument passed for the option "b", the parameter for getopts should be "b:". 2. To set $OPTARG to the invalid option name passed, the parameter for getopts starts with ":". 3. It's minor but to match the character "?", it's better to escape like "\\?". ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I checked that $GIT_BRANCH is set when do-release.sh is launched with -b option. I also checked that the error message contains invalid option name when do-release.sh is launched with an invalid option. Closes #28234 from sarutak/fix-do-release. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-04-16 12:54:10 -07:00
Kousuke Saruta	04f04e0ea7	[SPARK-31420][WEBUI] Infinite timeline redraw in job details page ### What changes were proposed in this pull request? Upgrade vis.js to fix an infinite re-drawing issue. As reported here, old releases of vis.js have that issue. Fortunately, the latest version seems to resolve the issue. With the latest release of vis.js, there are some performance issues with the original `timeline-view.js` and `timeline-view.css` so I also changed them. ### Why are the changes needed? For better UX. ### Does this PR introduce any user-facing change? No. Appearance and functionalities are not changed. ### How was this patch tested? I confirmed infinite redrawing doesn't happen with a JobPage which I had reproduced the issue. With the original version of vis.js, I reproduced the issue with the following conditions. * Use history server and load core/src/test/resources/spark-events. * Visit the JobPage for job2 in application_1553914137147_0018. * Zoom out to 80% on Safari / Chrome / Firefox. Maybe, it depends on OS and the version of browsers. Closes #28192 from sarutak/upgrade-visjs. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-04-13 23:23:00 -07:00
HyukjinKwon	9f58f03857	[SPARK-31231][BUILD] Unset setuptools version in pip packaging test ### What changes were proposed in this pull request? This PR unsets `setuptools` version in CI. This was fixed in the 0.46.1.2+ `setuptools` - pypa/setuptools#2046. `setuptools` 0.46.1.0 and 0.46.1.1 still have this problem. ### Why are the changes needed? To test the latest setuptools out to see if users can actually install and use it. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Jenkins will test. Closes #28111 from HyukjinKwon/SPARK-31231-revert. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-04 08:09:15 +09:00
Wenchen Fan	783852cc2e	[SPARK-31320] Fix release script for 3.0.0 ### What changes were proposed in this pull request? The release script stops working after `d5865493ae` , as we require `mkdocs 1.0.0`. This PR upgrades `mkdocs` from 0.1.6.3 to 1.0.4. To do that ruby is also upgraded to 2.5. This PR also fixes some small issues. ### Why are the changes needed? to make RC ### Does this PR introduce any user-facing change? no ### How was this patch tested? tested by 3.0.0-rc1 Closes #28088 from cloud-fan/rc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-01 16:45:28 +09:00
HyukjinKwon	4d4c3e76f6	Revert "[SPARK-30879][DOCS] Refine workflow for building docs" This reverts commit `7892f88f84`.	2020-03-31 16:11:59 +09:00
HyukjinKwon	178d472e1d	[SPARK-31231][BUILD][FOLLOW-UP] Set the upper bound (before 46.1.0) for setuptools in pip package test ## What changes were proposed in this pull request? This PR is a followup of apache/spark#27995. Rather then pining setuptools version, it sets upper bound so Python 3.5 with branch-2.4 tests can pass too. ## Why are the changes needed? To make the CI build stable ## Does this PR introduce any user-facing change? No, dev-only change. ## How was this patch tested? Jenkins will test. Closes #28005 from HyukjinKwon/investigate-pip-packaging-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-26 12:33:17 +09:00
HyukjinKwon	c181c45f86	[SPARK-31231][BUILD] Explicitly setuptools version as 46.0.0 in pip package test ### What changes were proposed in this pull request? For a bit of background, PIP packaging test started to fail (see [this logs](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120218/testReport/)) as of setuptools 46.1.0 release. In https://github.com/pypa/setuptools/issues/1424, they decided to don't keep the modes in `package_data`. In PySpark pip installation, we keep the executable scripts in `package_data` `fc4e56a54c/python/setup.py (L199-L200)`, and expose their symbolic links as executable scripts. So, the symbolic links (or copied scripts) executes the scripts copied from `package_data`, which doesn't have the executable permission in its mode: ``` /tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: Permission denied /tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: exec: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: cannot execute: Permission denied ``` The current issue is being tracked at https://github.com/pypa/setuptools/issues/2041 </br> For what this PR proposes: It sets the upper bound in PR builder for now to unblock other PRs. _This PR does not solve the issue yet. I will make a fix after monitoring https://github.com/pypa/setuptools/issues/2041_ ### Why are the changes needed? It currently affects users who uses the latest setuptools. So, _users seem unable to use PySpark with the latest setuptools._ See also https://github.com/pypa/setuptools/issues/2041#issuecomment-602566667 ### Does this PR introduce any user-facing change? It makes CI pass for now. No user-facing change yet. ### How was this patch tested? Jenkins will test. Closes #27995 from HyukjinKwon/investigate-pip-packaging. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-24 17:59:43 +09:00
Jungtaek Lim (HeartSaVioR)	f55f6b569b	[SPARK-31101][BUILD] Upgrade Janino to 3.0.16 ### What changes were proposed in this pull request? This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently. * Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate". Please see the commit log. - https://github.com/janino-compiler/janino/commits/3.0.16 You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect. ### Why are the changes needed? We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details. Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #27932 from HeartSaVioR/SPARK-31101-janino-3.0.16. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-21 19:10:23 -07:00
Nicholas Chammas	b4748ca0ab	[SPARK-31155] Remove pydocstyle tests ### What changes were proposed in this pull request? As discovered here https://github.com/apache/spark/pull/27910#issuecomment-599027190, pydocstyle tests were not running anywhere (not on Jenkins; not on GitHub). ~This PR enables those tests.~ It also seems like a [large hill to climb](https://github.com/apache/spark/pull/27912#issuecomment-599167117) to enable any meaningful checks, so we're going to just rip pydocstyle out for now. ### Why are the changes needed? Presumably, we defined those doc style tests because we care about whatever it is they enforce. Since we're not actually testing anything, though, it's better to clear the cruft. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Will check the GitHub workflow logs on this PR. Closes #27912 from nchammas/SPARK-31155-pydocstyle. Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-17 10:41:41 +09:00
Nicholas Chammas	0ce5519f17	[SPARK-31153][BUILD] Cleanup several failures in lint-python ### What changes were proposed in this pull request? This PR cleans up several failures -- most of them silent -- in `dev/lint-python`. I don't understand how we haven't been bitten by these yet. Perhaps we've been lucky? Fixes include: * Fix how we compare versions. All the version checks currently in `master` silently fail with: ``` File "<string>", line 2 print(LooseVersion("""2.3.1""") >= LooseVersion("""2.4.0""")) ^ IndentationError: unexpected indent ``` Another problem is that `distutils.version` is undocumented and unsupported. * Fix some basic bugs. e.g. We have an incorrect reference to `$PYDOCSTYLEBUILD`, which doesn't exist, which was causing the doc style test to silently fail with: ``` ./dev/lint-python: line 193: --version: command not found ``` * Stop suppressing error output! It's hiding problems and serves no purpose here. ### Why are the changes needed? `lint-python` is part of our CI build and is currently doing any combination of the following: silently failing; incorrectly skipping tests; incorrectly downloading libraries when a suitable library is already available. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Lots of manual testing with `set -x` enabled. Closes #27910 from nchammas/SPARK-31153-lint-python. Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-15 13:09:35 +09:00
Dale Clarke	2a4fed0443	[SPARK-30654][WEBUI] Bootstrap4 WebUI upgrade ### What changes were proposed in this pull request? Spark's Web UI is using an older version of Bootstrap (v. 2.3.2) for the portal pages. Bootstrap 2.x was moved to EOL in Aug 2013 and Bootstrap 3.x was moved to EOL in July 2019 (https://github.com/twbs/release). Older versions of Bootstrap are also getting flagged in security scans for various CVEs: https://snyk.io/vuln/SNYK-JS-BOOTSTRAP-72889 https://snyk.io/vuln/SNYK-JS-BOOTSTRAP-173700 https://snyk.io/vuln/npm:bootstrap:20180529 https://snyk.io/vuln/npm:bootstrap:20160627 I haven't validated each CVE, but it would be nice to resolve any potential issues and get on a supported release. The bad news is that there have been quite a few changes between Bootstrap 2 and Bootstrap 4. I've tried updating the library, refactoring/tweaking the CSS and JS to maintain a similar appearance and functionality, and testing the UI for functionality and appearance. This is a fairly large change so I'm sure additional testing and fixes will be needed. ### How was this patch tested? This has been manually tested, but there is a ton of functionality and there are many pages and detail pages so it is very possible bugs introduced from the upgrade were missed. Additional testing and feedback is welcomed. If it appears a whole page was missed let me know and I'll take a pass at addressing that page/section. Closes #27370 from clarkead/bootstrap4-core-upgrade. Authored-by: Dale Clarke <a.dale.clarke@gmail.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-03-13 15:24:48 -07:00
Nicholas Chammas	3341021add	[SPARK-31041][BUILD] Show Maven errors from within make-distribution.sh ### What changes were proposed in this pull request? This PR makes `dev/make-distribution.sh` a bit easier to use by not hiding errors thrown by Maven. As a supporting change, this PR also suppresses progress bar output from curl and wget that may be output from within `build/mvn`. ### Why are the changes needed? It's surprising for command-line options to be position-dependent. The errors that get thrown when you pass the correct option (like `--pip`) but in the wrong order are confusing. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I ran a few invocations of `make-distribution.sh` to confirm that, when passed incorrect options, I can actually see the errors thrown directly by Maven. Closes #27800 from nchammas/SPARK-31041-make-distribution. Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-11 08:22:02 -05:00
Dongjoon Hyun	93def95b08	[SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final ### What changes were proposed in this pull request? This PR aims to bring the bug fixes from the latest netty-all. ### Why are the changes needed? - 4.1.47.Final: https://github.com/netty/netty/milestone/222?closed=1 (15 patches or issues) - 4.1.46.Final: https://github.com/netty/netty/milestone/221?closed=1 (80 patches or issues) - 4.1.45.Final: https://github.com/netty/netty/milestone/220?closed=1 (23 patches or issues) - 4.1.44.Final: https://github.com/netty/netty/milestone/218?closed=1 (113 patches or issues) - 4.1.43.Final: https://github.com/netty/netty/milestone/217?closed=1 (63 patches or issues) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #27869 from dongjoon-hyun/SPARK-31095. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-10 17:50:34 -07:00
Nicholas Chammas	7892f88f84	[SPARK-30879][DOCS] Refine workflow for building docs ### What changes were proposed in this pull request? This PR makes the following refinements to the workflow for building docs: * Install Python and Ruby consistently using pyenv and rbenv across both the docs README and the release Dockerfile. * Pin the Python and Ruby versions we use. * Pin all direct Python and Ruby dependency versions. * Eliminate any use of `sudo pip`, which the Python community discourages, or `sudo gem`. ### Why are the changes needed? This PR should increase the consistency and reproducibility of the doc-building process by managing Python and Ruby in a more consistent way, and by eliminating unused or outdated code. Here's a possible example of an issue building the docs that would be addressed by the changes in this PR: https://github.com/apache/spark/pull/27459#discussion_r376135719 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manual tests: * I was able to build the Docker image successfully, minus the final part about `RUN useradd`. * I am unable to run `do-release-docker.sh` because I am not a committer and don't have the required GPG key. * I built the docs locally and viewed them in the browser. I think I need a committer to more fully test out these changes. Closes #27534 from nchammas/SPARK-30731-building-docs. Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-07 11:43:32 -06:00
HyukjinKwon	5b3277f4fc	[SPARK-30994][BUILD][FOLLOW-UP] Change scope of xml-apis to include it and add xerces in SBT as dependency override ### What changes were proposed in this pull request? This PR propose 1. Explicitly include xml-apis. xml-apis is already the part of xerces 2.12.0 (https://repo1.maven.org/maven2/xerces/xercesImpl/2.12.0/xercesImpl-2.12.0.pom). However, we're excluding it by setting `scope` to `test`. This seems causing `spark-shell`, built from Maven, to fail. Seems like previously xml-apis wasn't reached for some reasons but after we upgrade, it seems requiring. Therefore, this PR proposes to include it. 2. Pins `xerces` version in SBT as well. Seems this dependency is resolved differently from Maven. Note that Hadoop 3 does not looks requiring this as they replaced xerces as of [HDFS-12221](https://issues.apache.org/jira/browse/HDFS-12221). ### Why are the changes needed? To make `spark-shell` working from Maven build, and uses the same xerces version. ### Does this PR introduce any user-facing change? No, it's master only. ### How was this patch tested? 1. ```bash ./build/mvn -DskipTests -Psparkr -Phive clean package ./bin/spark-shell ``` Before: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.xerces.parsers.AbstractDOMParser.startDocument(Unknown Source) at org.apache.xerces.xinclude.XIncludeHandler.startDocument(Unknown Source) at org.apache.xerces.impl.dtd.XMLDTDValidator.startDocument(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl.startEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.startDocumentParsing(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2482) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2470) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2541) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2494) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2407) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115) at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.w3c.dom.ElementTraversal at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 42 more ``` After: ``` ... Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_202) Type in expressions to have them evaluated. Type :help for more information. scala> ``` 2. ``` ./build/sbt dependencyTree -Phadoop-2.7 -Phive-2.3 -Phive-thriftserver -Phive ./build/sbt dependencyTree -Phadoop-3.2 -Phive-2.3 -Phive-thriftserver -Phive ``` Closes #27808 from HyukjinKwon/SPARK-30994. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 09:39:02 +09:00
Sean Owen	97d9a22b04	[SPARK-30994][CORE] Update xerces to 2.12.0 ### What changes were proposed in this pull request? Manage up the version of Xerces that Hadoop uses (and potentially user apps) to 2.12.0 to match https://issues.apache.org/jira/browse/HADOOP-16530 ### Why are the changes needed? Picks up bug and security fixes: https://www.xml.com/news/2018-05-apache-xerces-j-2120/ ### Does this PR introduce any user-facing change? Should be no behavior changes. ### How was this patch tested? Existing tests. Closes #27746 from srowen/SPARK-30994. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-03 09:27:18 -06:00
Kent Yao	a6026c830a	[MINOR][BUILD] Fix make-distribution.sh to show usage without 'echo' cmd ### What changes were proposed in this pull request? turn off `x` mode and do not print the command while printing usage of `make-distribution.sh` ### Why are the changes needed? improve dev tools ### Does this PR introduce any user-facing change? for only developers, clearer hints #### after ``` ./dev/make-distribution.sh --hel +++ dirname ./dev/make-distribution.sh ++ cd ./dev/.. ++ pwd + SPARK_HOME=/Users/kentyao/spark + DISTDIR=/Users/kentyao/spark/dist + MAKE_TGZ=false + MAKE_PIP=false + MAKE_R=false + NAME=none + MVN=/Users/kentyao/spark/build/mvn + (( 1 )) + case $1 in + echo 'Error: --hel is not supported' Error: --hel is not supported + exit_with_usage + set +x make-distribution.sh - tool for making binary distributions of Spark usage: make-distribution.sh [--name] [--tgz] [--pip] [--r] [--mvn <mvn-command>] <maven build options> See Spark's "Building Spark" doc for correct Maven options. ``` #### before ``` +++ dirname ./dev/make-distribution.sh ++ cd ./dev/.. ++ pwd + SPARK_HOME=/Users/kentyao/spark + DISTDIR=/Users/kentyao/spark/dist + MAKE_TGZ=false + MAKE_PIP=false + MAKE_R=false + NAME=none + MVN=/Users/kentyao/spark/build/mvn + (( 1 )) + case $1 in + echo 'Error: --hel is not supported' Error: --hel is not supported + exit_with_usage + echo 'make-distribution.sh - tool for making binary distributions of Spark' make-distribution.sh - tool for making binary distributions of Spark + echo '' + echo usage: usage: + cl_options='[--name] [--tgz] [--pip] [--r] [--mvn <mvn-command>]' + echo 'make-distribution.sh [--name] [--tgz] [--pip] [--r] [--mvn <mvn-command>] <maven build options>' make-distribution.sh [--name] [--tgz] [--pip] [--r] [--mvn <mvn-command>] <maven build options> + echo 'See Spark'\''s "Building Spark" doc for correct Maven options.' See Spark's "Building Spark" doc for correct Maven options. + echo '' + exit 1 ``` ### How was this patch tested? manually Closes #27706 from yaooqinn/build. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-26 14:40:32 -08:00
Dongjoon Hyun	fc4e56a54c	[SPARK-30884][PYSPARK] Upgrade to Py4J 0.10.9 This PR aims to upgrade Py4J to `0.10.9` for better Python 3.7 support in Apache Spark 3.0.0 (master/branch-3.0). This is not for `branch-2.4`. - Apache Spark 3.0.0 is using `Py4J 0.10.8.1` (released on 2018-10-21) because `0.10.8.1` was the first official release to support Python 3.7. - https://www.py4j.org/changelog.html#py4j-0-10-8-and-py4j-0-10-8-1 - `Py4J 0.10.9` was released on January 25th 2020 with better Python 3.7 support and `magic_member` bug fix. - https://github.com/bartdag/py4j/releases/tag/0.10.9 - https://www.py4j.org/changelog.html#py4j-0-10-9 No. Pass the Jenkins with the existing tests. Closes #27641 from dongjoon-hyun/SPARK-30884. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-20 09:09:30 -08:00
HyukjinKwon	aa6a60530e	[SPARK-30722][PYTHON][DOCS] Update documentation for Pandas UDF with Python type hints ### What changes were proposed in this pull request? This PR targets to document the Pandas UDF redesign with type hints introduced at SPARK-28264. Mostly self-describing; however, there are few things to note for reviewers. 1. This PR replace the existing documentation of pandas UDFs to the newer redesign to promote the Python type hints. I added some words that Spark 3.0 still keeps the compatibility though. 2. This PR proposes to name non-pandas UDFs as "Pandas Function API" 3. SCALAR_ITER become two separate sections to reduce confusion: - `Iterator[pd.Series]` -> `Iterator[pd.Series]` - `Iterator[Tuple[pd.Series, ...]]` -> `Iterator[pd.Series]` 4. I removed some examples that look overkill to me. 5. I also removed some information in the doc, that seems duplicating or too much. ### Why are the changes needed? To document new redesign in pandas UDF. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests should cover. Closes #27466 from HyukjinKwon/SPARK-30722. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-02-12 10:49:46 +09:00
Yin Huai	ea626b6acf	[SPARK-30783] Exclude hive-service-rpc ### What changes were proposed in this pull request? Exclude hive-service-rpc from build. ### Why are the changes needed? hive-service-rpc 2.3.6 and spark sql's thrift server module have duplicate classes. Leaving hive-service-rpc 2.3.6 in the class path means that spark can pick up classes defined in hive instead of its thrift server module, which can cause hard to debug runtime errors due to class loading order and compilation errors for applications depend on spark. If you compare hive-service-rpc 2.3.6's jar (https://search.maven.org/remotecontent?filepath=org/apache/hive/hive-service-rpc/2.3.6/hive-service-rpc-2.3.6.jar) and spark thrift server's jar (e.g. https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-hive-thriftserver_2.12/3.0.0-SNAPSHOT/spark-hive-thriftserver_2.12-3.0.0-20200207.021914-364.jar), you will see that all of classes provided by hive-service-rpc-2.3.6.jar are covered by spark thrift server's jar. https://issues.apache.org/jira/browse/SPARK-30783 has output of jar tf for both jars. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #27533 from yhuai/SPARK-30783. Authored-by: Yin Huai <yhuai@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-02-12 00:12:45 +08:00
HyukjinKwon	20c60a43cc	[MINOR][INFRA] Factor Python executable out as a variable in 'lint-python' script ### What changes were proposed in this pull request? This PR proposes to make hardcoded `python3` to a variable `PYTHON_EXECUTABLE` in ' lint-python' script. ### Why are the changes needed? To make changes easier. See `561e9b9688` as an example. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually by running `dev/lint-python`. Closes #27470 from HyukjinKwon/minor-PYTHON_EXECUTABLE. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-05 17:01:33 -08:00
Onur Satici	86fdb818bf	[SPARK-30715][K8S] Bump fabric8 to 4.7.1 ### What changes were proposed in this pull request? Bump fabric8 kubernetes-client to 4.7.1 ### Why are the changes needed? New fabric8 version brings support for Kubernetes 1.17 clusters. Full release notes: - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.7.0 - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.7.1 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing unit and integration tests cover creation of K8S objects. Adjusted them to work with the new fabric8 version Closes #27443 from onursatici/os/bump-fabric8. Authored-by: Onur Satici <onursatici@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-05 01:17:30 -08:00
Dongjoon Hyun	1adf3520e3	[SPARK-30704][INFRA] Use jekyll-redirect-from 0.15.0 instead of the latest ### What changes were proposed in this pull request? This PR aims to pin the version of `jekyll-redirect-from` to 0.15.0. This is a release blocker for both Apache Spark 3.0.0 and 2.4.5. ### Why are the changes needed? `jekyll-redirect-from` released 0.16.0 a few days ago and that requires Ruby 2.4.0. - https://github.com/jekyll/jekyll-redirect-from/releases/tag/v0.16.0 ``` $ cd dev/create-release/spark-rm/ $ docker build -t spark:test . ... ERROR: Error installing jekyll-redirect-from: jekyll-redirect-from requires Ruby version >= 2.4.0. ... ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually do the above command to build `spark-rm` Docker image. ``` ... Successfully installed jekyll-redirect-from-0.15.0 Parsing documentation for jekyll-redirect-from-0.15.0 Installing ri documentation for jekyll-redirect-from-0.15.0 Done installing documentation for jekyll-redirect-from after 0 seconds 1 gem installed Successfully installed rouge-3.15.0 Parsing documentation for rouge-3.15.0 Installing ri documentation for rouge-3.15.0 Done installing documentation for rouge after 4 seconds 1 gem installed Removing intermediate container e0ec7c77b69f ---> 32dec37291c6 ``` Closes #27434 from dongjoon-hyun/SPARK-30704. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-02-02 00:44:25 -08:00
Dongjoon Hyun	2fd15a26fb	[SPARK-30695][BUILD] Upgrade Apache ORC to 1.5.9 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache ORC 1.5.9. - For `hive-2.3` profile, we need to upgrade `hive-storage-api` from `2.6.0` to `2.7.1`. - For `hive-1.2` profile, ORC library with classifier `nohive` already shaded it. So, there is no change. ### Why are the changes needed? This will bring the latest bug fixes. The following is the full release note. - https://issues.apache.org/jira/projects/ORC/versions/12346546 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Here is the summary. 1. `Hive 1.2 + Hadoop 2.7` passed. ([here](https://github.com/apache/spark/pull/27421#issuecomment-580924552)) 2. `Hive 2.3 + Hadoop 2.7` passed. ([here](https://github.com/apache/spark/pull/27421#issuecomment-580973391)) Closes #27421 from dongjoon-hyun/SPARK-ORC-1.5.9. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-31 17:41:27 -08:00
Dongjoon Hyun	561e9b9688	[SPARK-30674][INFRA] Use python3 in dev/lint-python ### What changes were proposed in this pull request? This PR aims to use `python3` instead of `python` in `dev/lint-python`. ### Why are the changes needed? Currently, `dev/lint-python` fails at Python 2. And, Python 2 is EOL since January 1st 2020. ``` $ python -V Python 2.7.17 $ dev/lint-python starting python compilation test... Python compilation failed with the following errors: Compiling ./python/setup.py ... File "./python/setup.py", line 27 file=sys.stderr) ^ SyntaxError: invalid syntax ``` ### Does this PR introduce any user-facing change? No. This is a dev environment. ### How was this patch tested? Jenkins is running this with Python 3 already. The following is a manual test. ``` $ python -V Python 3.8.0 $ dev/lint-python starting python compilation test... python compilation succeeded. ``` Closes #27394 from dongjoon-hyun/SPARK-30674. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-30 03:17:29 -08:00
Nicholas Chammas	bda0669110	[SPARK-30665][DOCS][BUILD][PYTHON] Eliminate pypandoc dependency ### What changes were proposed in this pull request? This PR removes any dependencies on pypandoc. It also makes related tweaks to the docs README to clarify the dependency on pandoc (not pypandoc). ### Why are the changes needed? We are using pypandoc to convert the Spark README from Markdown to ReST for PyPI. PyPI now natively supports Markdown, so we don't need pypandoc anymore. The dependency on pypandoc also sometimes causes issues when installing Python packages that depend on PySpark, as described in #18981. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually: ```sh python -m venv venv source venv/bin/activate pip install -U pip cd python/ python setup.py sdist pip install dist/pyspark-3.0.0.dev0.tar.gz pyspark --version ``` I also built the PySpark and R API docs with `jekyll` and reviewed them locally. It would be good if a maintainer could also test this by creating a PySpark distribution and uploading it to [Test PyPI](https://test.pypi.org) to confirm the README looks as it should. Closes #27376 from nchammas/SPARK-30665-pypandoc. Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-01-30 16:40:38 +09:00
Dongjoon Hyun	862959747e	[SPARK-30639][BUILD] Upgrade Jersey to 2.30 ### What changes were proposed in this pull request? For better JDK11 support, this PR aims to upgrade Jersey and javassist to `2.30` and `3.35.0-GA` respectively. ### Why are the changes needed? Jersey: This will bring the following `Jersey` updates. - https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.30.html - https://github.com/eclipse-ee4j/jersey/issues/4245 (Java 11 java.desktop module dependency) javassist: This is a transitive dependency from 3.20.0-CR2 to 3.25.0-GA. - `javassist` officially supports JDK11 from [3.24.0-GA release note](https://github.com/jboss-javassist/javassist/blob/master/Readme.html#L308). ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with both JDK8 and JDK11. Closes #27357 from dongjoon-hyun/SPARK-30639. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-25 15:41:55 -08:00
cody koeninger	843224ebd4	[SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with onlyChangedFiles feature ### What changes were proposed in this pull request? Update the scalafmt plugin to 1.0.3 and use its new onlyChangedFiles feature rather than --diff ### Why are the changes needed? Older versions of the plugin either didn't work with scala 2.13, or got rid of the --diff argument and didn't allow for formatting only changed files ### Does this PR introduce any user-facing change? The /dev/scalafmt script no longer passes through arbitrary args, instead using the arg to select scala version. The issue here is the plugin name literally contains the scala version, and doesn't appear to have a shorter way to refer to it. If srowen or someone else with better maven-fu has an idea I'm all ears. ### How was this patch tested? Manually, e.g. edited a file and ran dev/scalafmt or dev/scalafmt 2.13 Closes #27279 from koeninger/SPARK-30570. Authored-by: cody koeninger <cody@koeninger.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-23 12:44:43 -08:00
HyukjinKwon	ab0890bdb1	[SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types ### What changes were proposed in this pull request? This PR proposes to redesign pandas UDFs as described in [the proposal](https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing). ```python from pyspark.sql.functions import pandas_udf import pandas as pd pandas_udf("long") def plug_one(s: pd.Series) -> pd.Series: return s + 1 spark.range(10).select(plug_one("id")).show() ``` ``` +------------+ \|plug_one(id)\| +------------+ \| 1\| \| 2\| \| 3\| \| 4\| \| 5\| \| 6\| \| 7\| \| 8\| \| 9\| \| 10\| +------------+ ``` Note that, this PR address one of the future improvements described [here](https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit#heading=h.h3ncjpk6ujqu), "A couple of less-intuitive pandas UDF types" (by zero323) together. In short, - Adds new way with type hints as an alternative and experimental way. ```python pandas_udf(schema='...') def func(c1: Series, c2: Series) -> DataFrame: pass ``` - Replace and/or add an alias for three types below from UDF, and make them as separate standalone APIs. So, `pandas_udf` is now consistent with regular `udf`s and other expressions. `df.mapInPandas(udf)` -replace-> `df.mapInPandas(f, schema)` `df.groupby.apply(udf)` -alias-> `df.groupby.applyInPandas(f, schema)` `df.groupby.cogroup.apply(udf)` -replace-> `df.groupby.cogroup.applyInPandas(f, schema)` `df.groupby.apply` was added from 2.3 while the other were added in the master only. - No deprecation for the existing ways for now. ```python pandas_udf(schema='...', functionType=PandasUDFType.SCALAR) def func(c1, c2): pass ``` If users are happy with this, I plan to deprecate the existing way and declare using type hints is not experimental anymore. One design goal in this PR was that, avoid touching the internal (since we didn't deprecate the old ways for now), but supports type hints with a minimised changes only at the interface. - Once we deprecate or remove the old ways, I think it requires another refactoring for the internal in the future. At the very least, we should rename internal pandas evaluation types. - If users find this experimental type hints isn't quite helpful, we should simply revert the changes at the interface level. ### Why are the changes needed? In order to address old design issues. Please see [the proposal](https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing). ### Does this PR introduce any user-facing change? For behaviour changes, No. It adds new ways to use pandas UDFs by using type hints. See below. SCALAR: ```python pandas_udf(schema='...') def func(c1: Series, c2: DataFrame) -> Series: pass # DataFrame represents a struct column ``` SCALAR_ITER: ```python pandas_udf(schema='...') def func(iter: Iterator[Tuple[Series, DataFrame, ...]]) -> Iterator[Series]: pass # Same as SCALAR but wrapped by Iterator ``` GROUPED_AGG: ```python pandas_udf(schema='...') def func(c1: Series, c2: DataFrame) -> int: pass # DataFrame represents a struct column ``` GROUPED_MAP: This was added in Spark 2.3 as of SPARK-20396. As described above, it keeps the existing behaviour. Additionally, we now have a new alias `groupby.applyInPandas` for `groupby.apply`. See the example below: ```python def func(pdf): return pdf df.groupby("...").applyInPandas(func, schema=df.schema) ``` MAP_ITER: this is not a pandas UDF anymore This was added in Spark 3.0 as of SPARK-28198; and this PR replaces the usages. See the example below: ```python def func(iter): for df in iter: yield df df.mapInPandas(func, df.schema) ``` COGROUPED_MAP*: this is not a pandas UDF anymore This was added in Spark 3.0 as of SPARK-27463; and this PR replaces the usages. See the example below: ```python def asof_join(left, right): return pd.merge_asof(left, right, on="...", by="...") df1.groupby("...").cogroup(df2.groupby("...")).applyInPandas(asof_join, schema="...") ``` ### How was this patch tested? Unittests added and tested against Python 2.7, 3.6 and 3.7. Closes #27165 from HyukjinKwon/revisit-pandas. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-01-22 15:32:58 +09:00
HyukjinKwon	e170422f74	Revert "[SPARK-30534][INFRA] Use mvn in `dev/scalastyle`" This reverts commit `384899944b`.	2020-01-21 18:23:03 +09:00
Takeshi Yamamuro	775fae4640	[SPARK-30486][BUILD] Bump lz4-java version to 1.7.1 ### What changes were proposed in this pull request? This pr intends to upgrade lz4-java from 1.7.0 to 1.7.1. ### Why are the changes needed? This release includes a bug fix for older macOS. You can see the link below for the changes; https://github.com/lz4/lz4-java/blob/master/CHANGES.md#171 ### Does this PR introduce any user-facing change? ### How was this patch tested? Existing tests. Closes #27271 from maropu/SPARK-30486. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-19 19:05:30 -08:00
Sean Owen	a2081ae4e1	[SPARK-29290][CORE] Update to chill 0.9.5 ### What changes were proposed in this pull request? Update Twitter Chill to 0.9.5. ### Why are the changes needed? Primarily, Scala 2.13 support for later. Other changes from 0.9.3 are apparently just minor fixes and improvements: https://github.com/twitter/chill/releases ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests Closes #27227 from srowen/SPARK-29290. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-19 18:39:38 -08:00
Dongjoon Hyun	384899944b	[SPARK-30534][INFRA] Use mvn in `dev/scalastyle` ### What changes were proposed in this pull request? This PR aims to use `mvn` instead of `sbt` in `dev/scalastyle` to recover GitHub Action. ### Why are the changes needed? As of now, Apache Spark sbt build is broken by the Maven Central repository policy. https://stackoverflow.com/questions/59764749/requests-to-http-repo1-maven-org-maven2-return-a-501-https-required-status-an > Effective January 15, 2020, The Central Maven Repository no longer supports insecure > communication over plain HTTP and requires that all requests to the repository are > encrypted over HTTPS. We can reproduce this locally by the following. ``` $ rm -rf ~/.m2/repository/org/apache/apache/18/ $ build/sbt clean ``` And, in GitHub Action, `lint-scala` is the only one which is using `sbt`. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? First of all, GitHub Action should be recovered. Also, manually, do the following. Without Scalastyle violation ``` $ dev/scalastyle OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=384m; support was removed in 8.0 Using `mvn` from path: /usr/local/bin/mvn Scalastyle checks passed. ``` With Scalastyle violation ``` $ dev/scalastyle OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=384m; support was removed in 8.0 Using `mvn` from path: /usr/local/bin/mvn Scalastyle checks failed at following occurrences: error file=/Users/dongjoon/PRS/SPARK-HTTP-501/core/src/main/scala/org/apache/spark/SparkConf.scala message=There should be no empty line separating imports in the same group. line=22 column=0 error file=/Users/dongjoon/PRS/SPARK-HTTP-501/core/src/test/scala/org/apache/spark/resource/ResourceProfileSuite.scala message=There should be no empty line separating imports in the same group. line=22 column=0 ``` Closes #27242 from dongjoon-hyun/SPARK-30534. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-16 16:00:58 -08:00
Xinrong Meng	f88874194a	[SPARK-30491][INFRA] Enable dependency audit files to tell dependency classifier ### What changes were proposed in this pull request? Enable dependency audit files to tell the value of artifact id, version, and classifier of a dependency. For example, `avro-mapred-1.8.2-hadoop2.jar` should be expanded to `avro-mapred/1.8.2/hadoop2/avro-mapred-1.8.2-hadoop2.jar` where `avro-mapred` is the artifact id, `1.8.2` is the version, and `haddop2` is the classifier. ### Why are the changes needed? Dependency audit files are expected to be consumed by automated tests or downstream tools. However, current dependency audit files under `dev/deps` only show jar names. And there isn't a simple rule on how to parse the jar name to get the values of different fields. For example, `hadoop2` is the classifier of `avro-mapred-1.8.2-hadoop2.jar`, in contrast, `incubating` is the version of `htrace-core-3.1.0-incubating.jar`. Reference: There is a good example of the downstream tool that would be enabled as yhuai suggested, > Say we have a Spark application that depends on a third-party dependency `foo`, which pulls in `jackson` as a transient dependency. Unfortunately, `foo` depends on a different version of `jackson` than Spark. So, in the pom of this Spark application, we use the dependency management section to pin the version of `jackson`. By doing this, we are lifting `jackson` to the top-level dependency of my application and I want to have a way to keep tracking what Spark uses. What we can do is to cross-check my Spark application's classpath with what Spark uses. Then, with a test written in my code base, whenever my application bumps Spark version, this test will check what we define in the application and what Spark has, and then remind us to change our application's pom if needed. In my case, I am fine to directly access git to get these audit files. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Code changes are verified by generated dependency audit files naturally. Thus, there are no tests added. Closes #27177 from mengCareers/depsOptimize. Lead-authored-by: Xinrong Meng <meng.careers@gmail.com> Co-authored-by: mengCareers <meng.careers@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-15 20:19:44 -08:00
HyukjinKwon	dcdc9a8be7	[SPARK-28198][PYTHON][FOLLOW-UP] Run the tests of MAP ITER UDF in Jenkins ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/24997 missed to add `pyspark.sql.tests.test_pandas_udf_iter` to `modules.py`. This PR adds it. ### Why are the changes needed? Currently, Jenkins does not run the test cases. We should run them. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Jenkins should test. Closes #27141 from HyukjinKwon/SPARK-28198-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-01-09 13:45:50 +09:00
Eric Chang	5c71304b43	[SPARK-30450][INFRA][FOLLOWUP] Fix git folder regex for windows file separator ### What changes were proposed in this pull request? The regex is to exclude the .git folder for the python linter, but bash escaping caused only one forward slash to be included. This adds the necessary second slash. ### Why are the changes needed? This is necessary to properly match the file separator character. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually. Added File dev/something.git.py and ran `dev/lint-python` ```dev/lint-python pycodestyle checks failed. *** Error compiling './dev/something.git.py'... File "./dev/something.git.py", line 1 mport asdf2 ^ SyntaxError: invalid syntax``` Closes #27140 from ericfchang/master. Authored-by: Eric Chang <eric.chang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-08 19:38:20 -08:00
HyukjinKwon	ee8d661058	[SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package ### What changes were proposed in this pull request? This PR proposes to move pandas related functionalities into pandas package. Namely: ```bash pyspark/sql/pandas ├── __init__.py ├── conversion.py # Conversion between pandas <> PySpark DataFrames ├── functions.py # pandas_udf ├── group_ops.py # Grouped UDF / Cogrouped UDF + groupby.apply, groupby.cogroup.apply ├── map_ops.py # Map Iter UDF + mapInPandas ├── serializers.py # pandas <> PyArrow serializers ├── types.py # Type utils between pandas <> PyArrow └── utils.py # Version requirement checks ``` In order to separately locate `groupby.apply`, `groupby.cogroup.apply`, `mapInPandas`, `toPandas`, and `createDataFrame(pdf)` under `pandas` sub-package, I had to use a mix-in approach which Scala side uses often by `trait`, and also pandas itself uses this approach (see `IndexOpsMixin` as an example) to group related functionalities. Currently, you can think it's like Scala's self typed trait. See the structure below: ```python class PandasMapOpsMixin(object): def mapInPandas(self, ...): ... return ... # other Pandas <> PySpark APIs ``` ```python class DataFrame(PandasMapOpsMixin): # other DataFrame APIs equivalent to Scala side. ``` Yes, This is a big PR but they are mostly just moving around except one case `createDataFrame` which I had to split the methods. ### Why are the changes needed? There are pandas functionalities here and there and I myself gets lost where it was. Also, when you have to make a change commonly for all of pandas related features, it's almost impossible now. Also, after this change, `DataFrame` and `SparkSession` become more consistent with Scala side since pandas is specific to Python, and this change separates pandas-specific APIs away from `DataFrame` or `SparkSession`. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests should cover. Also, I manually built the PySpark API documentation and checked. Closes #27109 from HyukjinKwon/pandas-refactoring. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-01-09 10:22:50 +09:00
HyukjinKwon	390e6bd7bc	[SPARK-30453][BUILD][R] Update AppVeyor R version to 3.6.2 ### What changes were proposed in this pull request? R version 3.6.2 (Dark and Stormy Night) was released on 2019-12-12. This PR targets to upgrade R installation for AppVeyor CI environment. ### Why are the changes needed? To test the latest R versions before the release, and see if there are any regressions. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? AppVeyor will test. Closes #27124 from HyukjinKwon/upgrade-r-version-appveyor. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-07 18:43:21 -08:00
Eric Chang	ed8a260749	[SPARK-30450][INFRA] Exclude .git folder for python linter ### What changes were proposed in this pull request? This excludes the .git folder when the python linter runs. We want to exclude because there may be files in .git from other branches that could cause the linter to fail. ### Why are the changes needed? I ran into a case where there was a branch name that ended ".py" suffix so there were git refs files in .git folder in .git/logs/refs and .git/refs/remotes. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manual. ``` $ git branch 3.py $ git checkout 3.py Switched to branch '3.py' $ dev/lint-python starting python compilation test... Python compilation failed with the following errors: * Error compiling './.git/logs/refs/heads/3.py'... File "./.git/logs/refs/heads/3.py", line 1 0000000000000000000000000000000000000000 `895e572b73` Dongjoon Hyun <dhyunapple.com> 1578438255 -0800 branch: Created from master ^ SyntaxError: invalid syntax * Error compiling './.git/refs/heads/3.py'... File "./.git/refs/heads/3.py", line 1 `895e572b73` ^ SyntaxError: invalid syntax ``` Closes #27120 from ericfchang/master. Authored-by: Eric Chang <eric.chang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-01-07 15:14:17 -08:00
WeichenXu	88542bc3d9	[SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays ### What changes were proposed in this pull request? PySpark UDF to convert MLlib vectors to dense arrays. Example: ``` from pyspark.ml.functions import vector_to_array df.select(vector_to_array(col("features")) ``` ### Why are the changes needed? If a PySpark user wants to convert MLlib sparse/dense vectors in a DataFrame into dense arrays, an efficient approach is to do that in JVM. However, it requires PySpark user to write Scala code and register it as a UDF. Often this is infeasible for a pure python project. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? UT. Closes #26910 from WeichenXu123/vector_to_array. Authored-by: WeichenXu <weichen.xu@databricks.com> Signed-off-by: Xiangrui Meng <meng@databricks.com>	2020-01-06 16:18:51 -08:00
Sean Owen	fac6b9bde8	Revert [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies This reverts commit `709387d660`. See https://issues.apache.org/jira/browse/SPARK-27300?focusedCommentId=16990048&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16990048 and previous mailing list discussions. ### What changes were proposed in this pull request? Revert the addition of skeleton graph API modules for Spark 3.0. ### Why are the changes needed? It does not appear that content will be added to the module for Spark 3, so I propose avoiding committing to the modules, which are no-ops now, in the upcoming major 3.0 release. ### Does this PR introduce any user-facing change? No, the modules were not released. ### How was this patch tested? Existing tests, but mostly N/A. Closes #26928 from srowen/Revert27300. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-17 09:06:23 -08:00
Yuming Wang	5de5e46624	[SPARK-30268][INFRA] Fix incorrect pyspark version when releasing preview versions ### What changes were proposed in this pull request? This PR fix incorrect pyspark version when releasing preview versions. ### Why are the changes needed? Failed to make Spark binary distribution: ``` cp: cannot stat 'spark-3.0.0-preview2-bin-hadoop2.7/python/dist/pyspark-3.0.0.dev02.tar.gz': No such file or directory gpg: can't open 'pyspark-3.0.0.dev02.tar.gz': No such file or directory gpg: signing failed: No such file or directory gpg: pyspark-3.0.0.dev02.tar.gz: No such file or directory ``` ``` yumwangubuntu-3513086:~/spark-release/output$ ll spark-3.0.0-preview2-bin-hadoop2.7/python/dist/ total 214140 drwxr-xr-x 2 yumwang stack 4096 Dec 16 06:17 ./ drwxr-xr-x 9 yumwang stack 4096 Dec 16 06:17 ../ -rw-r--r-- 1 yumwang stack 219267173 Dec 16 06:17 pyspark-3.0.0.dev2.tar.gz ``` ``` /usr/local/lib/python3.6/dist-packages/setuptools/dist.py:476: UserWarning: Normalizing '3.0.0.dev02' to '3.0.0.dev2' normalized_version, ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: ``` LM-SHC-16502798:spark yumwang$ SPARK_VERSION=3.0.0-preview2 LM-SHC-16502798:spark yumwang$ echo "$SPARK_VERSION" \| sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/" 3.0.0.dev2 ``` Closes #26909 from wangyum/SPARK-30268. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-17 10:22:29 +09:00
Yuming Wang	ba0f59bfaf	[SPARK-30265][INFRA] Do not change R version when releasing preview versions ### What changes were proposed in this pull request? This PR makes it do not change R version when releasing preview versions. ### Why are the changes needed? Failed to make Spark binary distribution: ``` ++ . /opt/spark-rm/output/spark-3.0.0-preview2-bin-hadoop2.7/R/find-r.sh +++ '[' -z /usr/bin ']' ++ /usr/bin/Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }' Loading required package: usethis Updating SparkR documentation First time using roxygen2. Upgrading automatically... Loading SparkR Invalid DESCRIPTION: Malformed package version. See section 'The DESCRIPTION file' in the 'Writing R Extensions' manual. Error: invalid version specification '3.0.0-preview2' In addition: Warning message: roxygen2 requires Encoding: UTF-8 Execution halted [ERROR] Command execution failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal (DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.execute (DefaultExecutor.java:166) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:804) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:751) at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:313) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289) at org.apache.maven.cli.MavenCli.main (MavenCli.java:193) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347) [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-preview2: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 18.619 s] [INFO] Spark Project Tags ................................. SUCCESS [ 13.652 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 5.673 s] [INFO] Spark Project Local DB ............................. SUCCESS [ 2.081 s] [INFO] Spark Project Networking ........................... SUCCESS [ 3.509 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 0.993 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 7.556 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 5.522 s] [INFO] Spark Project Core ................................. FAILURE [01:06 min] [INFO] Spark Project ML Local Library ..................... SKIPPED [INFO] Spark Project GraphX ............................... SKIPPED [INFO] Spark Project Streaming ............................ SKIPPED [INFO] Spark Project Catalyst ............................. SKIPPED [INFO] Spark Project SQL .................................. SKIPPED [INFO] Spark Project ML Library ........................... SKIPPED [INFO] Spark Project Tools ................................ SKIPPED [INFO] Spark Project Hive ................................. SKIPPED [INFO] Spark Project Graph API ............................ SKIPPED [INFO] Spark Project Cypher ............................... SKIPPED [INFO] Spark Project Graph ................................ SKIPPED [INFO] Spark Project REPL ................................. SKIPPED [INFO] Spark Project Assembly ............................. SKIPPED [INFO] Kafka 0.10+ Token Provider for Streaming ........... SKIPPED [INFO] Spark Integration for Kafka 0.10 ................... SKIPPED [INFO] Kafka 0.10+ Source for Structured Streaming ........ SKIPPED [INFO] Spark Project Examples ............................. SKIPPED [INFO] Spark Integration for Kafka 0.10 Assembly .......... SKIPPED [INFO] Spark Avro ......................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:04 min [INFO] Finished at: 2019-12-16T08:02:45Z [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:exec (sparkr-pkg) on project spark-core_2.12: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <args> -rf :spark-core_2.12 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: ```diff diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R index cdb59093781..b648c51e010 100644 --- a/R/pkg/R/sparkR.R +++ b/R/pkg/R/sparkR.R -336,8 +336,8 sparkR.session <- function( # Check if version number of SparkSession matches version number of SparkR package jvmVersion <- callJMethod(sparkSession, "version") - # Remove -SNAPSHOT from jvm versions - jvmVersionStrip <- gsub("-SNAPSHOT", "", jvmVersion) + # Remove -preview2 from jvm versions + jvmVersionStrip <- gsub("-preview2", "", jvmVersion) rPackageVersion <- paste0(packageVersion("SparkR")) if (jvmVersionStrip != rPackageVersion) { ``` Closes #26904 from wangyum/SPARK-30265. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>	2019-12-16 04:54:12 -07:00
Yuming Wang	1fc353d51a	Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` ### What changes were proposed in this pull request? This reverts commit `7c0ce285`. ### Why are the changes needed? Failed to make distribution: ``` [INFO] -----------------< org.apache.spark:spark-sketch_2.12 >----------------- [INFO] Building Spark Project Sketch 3.0.0-preview2 [3/33] [INFO] --------------------------------[ jar ]--------------------------------- [INFO] Downloading from central: https://repo.maven.apache.org/maven2/org/apache/spark/spark-tags_2.12/3.0.0-preview2/spark-tags_2.12-3.0.0-preview2-tests.jar [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-preview2: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 26.513 s] [INFO] Spark Project Tags ................................. SUCCESS [ 48.393 s] [INFO] Spark Project Sketch ............................... FAILURE [ 0.034 s] [INFO] Spark Project Local DB ............................. SKIPPED [INFO] Spark Project Networking ........................... SKIPPED [INFO] Spark Project Shuffle Streaming Service ............ SKIPPED [INFO] Spark Project Unsafe ............................... SKIPPED [INFO] Spark Project Launcher ............................. SKIPPED [INFO] Spark Project Core ................................. SKIPPED [INFO] Spark Project ML Local Library ..................... SKIPPED [INFO] Spark Project GraphX ............................... SKIPPED [INFO] Spark Project Streaming ............................ SKIPPED [INFO] Spark Project Catalyst ............................. SKIPPED [INFO] Spark Project SQL .................................. SKIPPED [INFO] Spark Project ML Library ........................... SKIPPED [INFO] Spark Project Tools ................................ SKIPPED [INFO] Spark Project Hive ................................. SKIPPED [INFO] Spark Project Graph API ............................ SKIPPED [INFO] Spark Project Cypher ............................... SKIPPED [INFO] Spark Project Graph ................................ SKIPPED [INFO] Spark Project REPL ................................. SKIPPED [INFO] Spark Project YARN Shuffle Service ................. SKIPPED [INFO] Spark Project YARN ................................. SKIPPED [INFO] Spark Project Mesos ................................ SKIPPED [INFO] Spark Project Kubernetes ........................... SKIPPED [INFO] Spark Project Hive Thrift Server ................... SKIPPED [INFO] Spark Project Assembly ............................. SKIPPED [INFO] Kafka 0.10+ Token Provider for Streaming ........... SKIPPED [INFO] Spark Integration for Kafka 0.10 ................... SKIPPED [INFO] Kafka 0.10+ Source for Structured Streaming ........ SKIPPED [INFO] Spark Project Examples ............................. SKIPPED [INFO] Spark Integration for Kafka 0.10 Assembly .......... SKIPPED [INFO] Spark Avro ......................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:15 min [INFO] Finished at: 2019-12-16T05:29:43Z [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal on project spark-sketch_2.12: Could not resolve dependencies for project org.apache.spark:spark-sketch_2.12🫙3.0.0-preview2: Could not find artifact org.apache.spark:spark-tags_2.12🫙tests:3.0.0-preview2 in central (https://repo.maven.apache.org/maven2) -> [Help 1] [ERROR] ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test. Closes #26902 from wangyum/SPARK-30056. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>	2019-12-15 23:16:17 -07:00
Yuming Wang	26b658f6fb	[SPARK-30253][INFRA] Do not add commits when releasing preview version ### What changes were proposed in this pull request? This PR add support do not add commits to master branch when releasing preview version. ### Why are the changes needed? We need manual revert this change, example: ![image](https://user-images.githubusercontent.com/5399861/70788945-f9d15180-1dcc-11ea-81f5-c0d89c28440a.png) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test Closes #26879 from wangyum/SPARK-30253. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>	2019-12-15 19:44:29 -07:00
Yuming Wang	e1ee3fb72f	[SPARK-30216][INFRA] Use python3 in Docker release image ### What changes were proposed in this pull request? - Reverts commit `1f94bf4` and `d6be46e` - Switches python to python3 in Docker release image. ### Why are the changes needed? `dev/make-distribution.sh` and `python/setup.py` are use python3. https://github.com/apache/spark/pull/26844/files#diff-ba2c046d92a1d2b5b417788bfb5cb5f8L236 https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: ``` yumwangubuntu-3513086:~/spark$ dev/create-release/do-release-docker.sh -n -d /home/yumwang/spark-release Output directory already exists. Overwrite and continue? [y/n] y Branch [branch-2.4]: master Current branch version is 3.0.0-SNAPSHOT. Release [3.0.0]: 3.0.0-preview2 RC # [1]: This is a dry run. Please confirm the ref that will be built for testing. Ref [master]: ASF user [yumwang]: Full name [Yuming Wang]: GPG key [yumwangapache.org]: DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338 ================ Release details: BRANCH: master VERSION: 3.0.0-preview2 TAG: v3.0.0-preview2-rc1 NEXT: 3.0.1-SNAPSHOT ASF USER: yumwang GPG KEY: DBD447010C1B4F7DAD3F7DFD6E1B4122F6A3A338 FULL NAME: Yuming Wang E-MAIL: yumwangapache.org ================ Is this info correct [y/n]? y GPG passphrase: ======================== = Building spark-rm image with tag latest... Command: docker build -t spark-rm:latest --build-arg UID=110302528 /home/yumwang/spark/dev/create-release/spark-rm Log file: docker-build.log Building v3.0.0-preview2-rc1; output will be at /home/yumwang/spark-release/output gpg: directory '/home/spark-rm/.gnupg' created gpg: keybox '/home/spark-rm/.gnupg/pubring.kbx' created gpg: /home/spark-rm/.gnupg/trustdb.gpg: trustdb created gpg: key 6E1B4122F6A3A338: public key "Yuming Wang <yumwangapache.org>" imported gpg: key 6E1B4122F6A3A338: secret key imported gpg: Total number processed: 1 gpg: imported: 1 gpg: secret keys read: 1 gpg: secret keys imported: 1 ======================== = Creating release tag v3.0.0-preview2-rc1... Command: /opt/spark-rm/release-tag.sh Log file: tag.log It may take some time for the tag to be synchronized to github. Press enter when you've verified that the new tag (v3.0.0-preview2-rc1) is available. ======================== = Building Spark... Command: /opt/spark-rm/release-build.sh package Log file: build.log ======================== = Building documentation... Command: /opt/spark-rm/release-build.sh docs Log file: docs.log ======================== = Publishing release Command: /opt/spark-rm/release-build.sh publish-release Log file: publish.log ``` Generated doc: ![image](https://user-images.githubusercontent.com/5399861/70693075-a7723100-1cf7-11ea-9f88-9356a02349a1.png) Closes #26848 from wangyum/SPARK-30216. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-13 11:31:31 -08:00
Dongjoon Hyun	cc276f8a6e	[SPARK-30243][BUILD][K8S] Upgrade K8s client dependency to 4.6.4 ### What changes were proposed in this pull request? This PR aims to upgrade K8s client library from 4.6.1 to 4.6.4 for `3.0.0-preview2`. ### Why are the changes needed? This will bring the latest bug fixes. - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.4 - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.3 - https://github.com/fabric8io/kubernetes-client/releases/tag/v4.6.2 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with K8s integration test. Closes #26874 from dongjoon-hyun/SPARK-30243. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-13 08:25:51 -08:00
Yuming Wang	39c0696a39	[MINOR] Fix google style guide address ### What changes were proposed in this pull request? This PR update google style guide address to `https://google.github.io/styleguide/javaguide.html`. ### Why are the changes needed? `https://google-styleguide.googlecode.com/svn-history/r130/trunk/javaguide.html` 404: ![image](https://user-images.githubusercontent.com/5399861/70717915-431c9500-1d2a-11ea-895b-024be953a116.png) ### Does this PR introduce any user-facing change? No ### How was this patch tested? Closes #26865 from wangyum/fix-google-styleguide. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2019-12-12 11:04:01 -06:00
Dongjoon Hyun	b709091b4f	[SPARK-30228][BUILD] Update zstd-jni to 1.4.4-3 ### What changes were proposed in this pull request? This PR aims to update zstd-jni library to 1.4.4-3. ### Why are the changes needed? This will bring the latest bug fixes in zstd itself and some performance improvement. - https://github.com/facebook/zstd/releases/tag/v1.4.4 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26856 from dongjoon-hyun/SPARK-ZSTD-144. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-12 14:16:32 +09:00
Yuming Wang	eb509968a7	[SPARK-30211][INFRA] Use python3 in make-distribution.sh ### What changes were proposed in this pull request? This PR switches python to python3 in `make-distribution.sh`. ### Why are the changes needed? SPARK-29672 changed this - https://github.com/apache/spark/pull/26330/files#diff-8cf6167d58ce775a08acafcfe6f40966 ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #26844 from wangyum/SPARK-30211. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-10 23:30:12 -08:00
Takeshi Yamamuro	be867e8a9e	[SPARK-30196][BUILD] Bump lz4-java version to 1.7.0 ### What changes were proposed in this pull request? This pr intends to upgrade lz4-java from 1.6.0 to 1.7.0. ### Why are the changes needed? This release includes a performance bug (https://github.com/lz4/lz4-java/pull/143) fixed by JoshRosen and some improvements (e.g., LZ4 binary update). You can see the link below for the changes; https://github.com/lz4/lz4-java/blob/master/CHANGES.md#170 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #26823 from maropu/LZ4_1_7_0. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-10 12:22:03 +09:00
Dongjoon Hyun	afc4fa02bd	[SPARK-30156][BUILD] Upgrade Jersey from 2.29 to 2.29.1 ### What changes were proposed in this pull request? This PR aims to upgrade `Jersey` from 2.29 to 2.29.1. ### Why are the changes needed? This will bring several bug fixes and important dependency upgrades. - https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.29.1.html ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26785 from dongjoon-hyun/SPARK-30156. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-12-06 18:49:43 -08:00
Dongjoon Hyun	1e0037b5e9	[SPARK-30157][BUILD][TEST-HADOOP3.2][TEST-JAVA11] Upgrade Apache HttpCore from 4.4.10 to 4.4.12 ### What changes were proposed in this pull request? This PR aims to upgrade `Apache HttpCore` from 4.4.10 to 4.4.12. ### Why are the changes needed? `Apache HttpCore v4.4.11` is the first official release for JDK11. > This is a maintenance release that corrects a number of defects in non-blocking SSL session code that caused compatibility issues with TLSv1.3 protocol implementation shipped with Java 11. For the full release note, please see the following. - https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26786 from dongjoon-hyun/SPARK-30157. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-07 10:59:10 +09:00
Dongjoon Hyun	1595e46a4e	[SPARK-30142][TEST-MAVEN][BUILD] Upgrade Maven to 3.6.3 ### What changes were proposed in this pull request? This PR aims to upgrade Maven from 3.6.2 to 3.6.3. ### Why are the changes needed? This will bring bug fixes like the following. - MNG-6759 Maven fails to use <repositories> section from dependency when resolving transitive dependencies in some cases - MNG-6760 ExclusionArtifactFilter result invalid when wildcard exclusion is followed by other exclusions The following is the full release note. - https://maven.apache.org/docs/3.6.3/release-notes.html ### Does this PR introduce any user-facing change? No. (This is a dev-environment change.) ### How was this patch tested? Pass the Jenkins with both SBT and Maven. Closes #26770 from dongjoon-hyun/SPARK-30142. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-06 23:41:59 +09:00

1 2 3 4 5 ...

928 commits