ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Dongjoon Hyun	6352846f1c	[SPARK-36732][BUILD][FOLLOWUP] Fix dependency manifest	2021-09-15 23:38:48 -07:00
Dongjoon Hyun	63b8417794	[SPARK-36732][SQL][BUILD] Upgrade ORC to 1.6.11 ### What changes were proposed in this pull request? This PR aims to upgrade Apache ORC to 1.6.11 to bring the latest bug fixes. ### Why are the changes needed? Apache ORC 1.6.11 has the following fixes. - https://issues.apache.org/jira/projects/ORC/versions/12350499 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #33971 from dongjoon-hyun/SPARK-36732. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `c217797297`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-09-15 23:36:36 -07:00
Dongjoon Hyun	2067661869	[SPARK-36759][BUILD] Upgrade Scala to 2.12.15 ### What changes were proposed in this pull request? This PR aims to upgrade Scala to 2.12.15 to support Java 17/18 better. ### Why are the changes needed? Scala 2.12.15 improves compatibility with JDK 17 and 18: https://github.com/scala/scala/releases/tag/v2.12.15 - Avoids IllegalArgumentException in JDK 17+ for lambda deserialization - Upgrades to ASM 9.2, for JDK 18 support in optimizer ### Does this PR introduce _any_ user-facing change? Yes, this is a Scala version change. ### How was this patch tested? Pass the CIs Closes #33999 from dongjoon-hyun/SPARK-36759. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `16f1f71ba5`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-09-15 13:43:36 -07:00
Chao Sun	a7dc8242ea	[SPARK-36726] Upgrade Parquet to 1.12.1 ### What changes were proposed in this pull request? Upgrade Apache Parquet to 1.12.1 ### Why are the changes needed? Parquet 1.12.1 contains the following bug fixes: - PARQUET-2064: Make Range public accessible in RowRanges - PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream` - PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding - PARQUET-1633: Fix integer overflow - PARQUET-2054: fix TCP leaking when calling ParquetFileWriter.appendFile - PARQUET-2072: Do Not Determine Both Min/Max for Binary Stats - PARQUET-2073: Fix estimate remaining row count in ColumnWriteStoreBase - PARQUET-2078: Failed to read parquet file after writing with the same In particular PARQUET-2078 is a blocker for the upcoming Apache Spark 3.2.0 release. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests + a new test for the issue in SPARK-36696 Closes #33969 from sunchao/upgrade-parquet-12.1. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com> (cherry picked from commit `a927b0836b`) Signed-off-by: DB Tsai <d_tsai@apple.com>	2021-09-15 19:17:49 +00:00
Dongjoon Hyun	6a1dacb6b6	[SPARK-36712][BUILD][FOLLOWUP] Improve the regex to avoid breaking pom.xml ### What changes were proposed in this pull request? This PR aims to fix the regex to avoid breaking `pom.xml`. ### Why are the changes needed? BEFORE ``` $ dev/change-scala-version.sh 2.12 $ git diff \| head -n10 diff --git a/core/pom.xml b/core/pom.xml index dbde22f2bf..6ed368353b 100644 --- a/core/pom.xml +++ b/core/pom.xml -35,7 +35,7 </properties> <dependencies> - <!--<!-- ``` AFTER Since the default Scala version is `2.12`, the following `no-op` is the correct behavior which is consistent with the previous behavior. ``` $ dev/change-scala-version.sh 2.12 $ git diff ``` ### Does this PR introduce _any_ user-facing change? No. This is a dev only change. ### How was this patch tested? Manually. Closes #33996 from dongjoon-hyun/SPARK-36712. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `d730ef24fe`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-09-14 16:27:02 -07:00
Lukas Rytz	2e7583799e	[SPARK-36712][BUILD] Make scala-parallel-collections in 2.13 POM a direct dependency (not in maven profile) As [reported on `devspark.apache.org`](https://lists.apache.org/thread.html/r84cff66217de438f1389899e6d6891b573780159cd45463acf3657aa%40%3Cdev.spark.apache.org%3E), the published POMs when building with Scala 2.13 have the `scala-parallel-collections` dependency only in the `scala-2.13` profile of the pom. ### What changes were proposed in this pull request? This PR suggests to work around this by un-commenting the `scala-parallel-collections` dependency when switching to 2.13 using the the `change-scala-version.sh` script. I included an upgrade to scala-parallel-collections version 1.0.3, the changes compared to 0.2.0 are minor. - removed OSGi metadata - renamed some internal inner classes - added `Automatic-Module-Name` ### Why are the changes needed? According to the posts, this solves issues for developers that write unit tests for their applications. Stephen Coy suggested to use the https://www.mojohaus.org/flatten-maven-plugin. While this sounds like a more principled solution, it is possibly too risky to do at this specific point in time? ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Locally Closes #33948 from lrytz/parCollDep. Authored-by: Lukas Rytz <lukas.rytz@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit `1a62e6a2c1`) Signed-off-by: Sean Owen <srowen@gmail.com>	2021-09-13 11:06:58 -05:00
Kousuke Saruta	dad566c1f2	[SPARK-36729][BUILD] Upgrade Netty from 4.1.63 to 4.1.68 ### What changes were proposed in this pull request? This PR upgrades Netty from `4.1.63` to `4.1.68`. All the changes from `4.1.64` to `4.1.68` are as follows. * 4.1.64 and 4.1.65 * https://netty.io/news/2021/05/19/4-1-65-Final.html * 4.1.66 * https://netty.io/news/2021/07/16/4-1-66-Final.html * 4.1.67 * https://netty.io/news/2021/08/16/4-1-67-Final.html * 4.1.68 * https://netty.io/news/2021/09/09/4-1-68-Final.html ### Why are the changes needed? Recently Netty `4.1.68` was released, which includes official M1 Mac support. * Add support for mac m1 * https://github.com/netty/netty/pull/11666 `4.1.65` also includes a critical bug fix which Spark might be affected. * JNI classloader deadlock with latest JDK version * https://github.com/netty/netty/issues/11209 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CIs. Closes #33970 from sarutak/upgrade-netty-4.1.68. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `e1e19619b7`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-09-12 10:07:40 -07:00
Dongjoon Hyun	cdf21f729a	[SPARK-36629][BUILD] Upgrade `aircompressor` to 1.21 ### What changes were proposed in this pull request? This PR aims to upgrade `aircompressor` dependency from 1.19 to 1.21. ### Why are the changes needed? This will bring the latest bug fix which exists in `aircompressor` 1.17 ~ 1.20. - `1e364f7133` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #33883 from dongjoon-hyun/SPARK-36629. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `ff8cc4b800`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-08-31 22:35:53 -07:00
Gengliang Wang	f47a519721	[SPARK-36551][BUILD] Add sphinx-plotly-directive in Spark release Dockerfile ### What changes were proposed in this pull request? After https://github.com/apache/spark/pull/32726, Python doc build requires `sphinx-plotly-directive`. This PR is to install it from `spark-rm/Dockerfile` to make sure `do-release-docker.sh` can run successfully. Also, this PR mentions it in the README of docs. ### Why are the changes needed? Fix release script and update README of docs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual test locally. Closes #33797 from gengliangwang/fixReleaseDocker. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit `42eebb84f5`) Signed-off-by: Gengliang Wang <gengliang@apache.org>	2021-08-20 20:02:44 +08:00
Sean Owen	b8c1014e23	Update Spark key negotiation protocol	2021-08-14 09:08:29 -05:00
William Hyun	1a371fbfa1	[SPARK-36482][BUILD] Bump orc to 1.6.10 ### What changes were proposed in this pull request? This PR aims to bump ORC to 1.6.10 ### Why are the changes needed? This will bring the latest bug fixes. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #33712 from williamhyun/orc. Authored-by: William Hyun <william@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `aff1b5594a`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-08-11 11:32:18 -07:00
Liang-Chi Hsieh	712c311736	[SPARK-36393][BUILD] Try to raise memory for GHA ### What changes were proposed in this pull request? According to the feedback from GitHub, the change causing memory issue has been rolled back. We can try to raise memory again for GA. ### Why are the changes needed? Trying higher memory settings for GA. It could speed up the testing time. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA Closes #33623 from viirya/increasing-mem-ga. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit `7d13ac177b`) Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>	2021-08-05 01:31:45 -07:00
Hyukjin Kwon	310cd8eef1	[SPARK-36092][INFRA][BUILD][PYTHON] Migrate to GitHub Actions with Codecov from Jenkins This PR proposes to migrate Coverage report from Jenkins to GitHub Actions by setting a dailly cron job. For some background, currently PySpark code coverage is being reported in this specific Jenkins job: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/ Because of the security issue between [Codecov service](https://app.codecov.io/gh/) and Jenkins machines, we had to work around by manually hosting a coverage site via GitHub pages, see also https://spark-test.github.io/pyspark-coverage-site/ by spark-test account (which is shared to only subset of PMC members). Since we now run the build via GitHub Actions, we can leverage [Codecov plugin](https://github.com/codecov/codecov-action), and remove the workaround we used. Virtually no. Coverage site (UI) might change but the information it holds should be virtually the same. I manually tested: - Scheduled run: https://github.com/HyukjinKwon/spark/actions/runs/1082261484 - Coverage report: `73f0291a7d/python/pyspark` - Run against a PR: https://github.com/HyukjinKwon/spark/actions/runs/1082367175 Closes #33591 from HyukjinKwon/SPARK-36092. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `c0d1860f25`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-08-01 21:38:39 +09:00
itholic	a9c5b1a5c8	[SPARK-36254][INFRA][PYTHON] Install mlflow in Github Actions CI ### What changes were proposed in this pull request? This PR proposes adding a Python package, `mlflow` and `sklearn` to enable the MLflow test in pandas API on Spark. ### Why are the changes needed? To enable the MLflow test in pandas API on Spark. ### Does this PR introduce _any_ user-facing change? No, it's test-only ### How was this patch tested? Manually test on local, with `python/run-tests --testnames pyspark.pandas.mlflow`. Closes #33567 from itholic/SPARK-36254. Lead-authored-by: itholic <haejoon.lee@databricks.com> Co-authored-by: Haejoon Lee <44108233+itholic@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `abce61f3fd`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-30 00:04:59 -07:00
Yuanjian Li	e8462a584c	[SPARK-36347][SS] Upgrade the RocksDB version to 6.20.3 ### What changes were proposed in this pull request? As the discussion in https://github.com/apache/spark/pull/32928/files#r654049392, after confirming the compatibility, we can use a newer RocksDB version for the state store implementation. ### Why are the changes needed? For further ARM support and leverage the bug fix for the newer version. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #33578 from xuanyuanking/SPARK-36347. Authored-by: Yuanjian Li <yuanjian.li@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit `4cd5fa96d8`) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-07-29 11:09:10 -07:00
William Hyun	dfa5c4dadc	[SPARK-36285][INFRA][TESTS] Skip MiMa in PySpark/SparkR/Docker GHA job This PR aims to skip MiMa in PySpark/SparkR/Docker GHA job. This will save GHA resource because MiMa is irrelevant to Python. No. Pass the GHA. Closes #33532 from williamhyun/mima. Lead-authored-by: William Hyun <william@apache.org> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `674202e7b6`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-07-27 16:50:10 +09:00
Sean Owen	c7d246ba4e	[SPARK-35310][MLLIB] Update to breeze 1.2 Update to the latest breeze 1.2 Minor bug fixes No. Existing tests Closes #33449 from srowen/SPARK-35310. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2021-07-24 08:20:25 -05:00
Takuya UESHIN	c1434b1928	[SPARK-36279][INFRA][PYTHON] Fix lint-python to work with Python 3.9 ### What changes were proposed in this pull request? Fix `lint-python` to pick `PYTHON_EXECUTABLE` from the environment variable first to switch the Python and explicitly specify `PYTHON_EXECUTABLE` to use `python3.9` in CI. ### Why are the changes needed? Currently `lint-python` uses `python3`, but it's not the one we expect in CI. As a result, `black` check is not working. ``` The python3 -m black command was not found. Skipping black checks for now. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The `black` check in `lint-python` should work. Closes #33507 from ueshin/issues/SPARK-36279/lint-python. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `663cbdfbe5`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-07-24 16:49:51 +09:00
Liang-Chi Hsieh	a6418a3463	[SPARK-36270][BUILD] Change memory settings for enabling GA ### What changes were proposed in this pull request? Trying to adjust build memory settings and serial execution to re-enable GA. ### Why are the changes needed? GA tests are failed recently due to return code 137. We need to adjust build settings to make GA work. ### Does this PR introduce _any_ user-facing change? No, dev only. ### How was this patch tested? GA Closes #33447 from viirya/test-ga. Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `fd36ed4550`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-07-23 19:11:09 +09:00
Hyukjin Kwon	f169f056b4	[SPARK-36268][PYTHON] Set the lowerbound of mypy version to 0.910 ### What changes were proposed in this pull request? This PR proposes to set the lowerbound of mypy version to use in the testing script. ### Why are the changes needed? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141519/console ``` python/pyspark/mllib/tree.pyi:29: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/tree.pyi:38: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/feature.pyi:34: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/feature.pyi:42: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/feature.pyi:48: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/feature.pyi:54: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/feature.pyi:76: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/feature.pyi:124: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/feature.pyi:165: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/clustering.pyi:45: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/clustering.pyi:72: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/classification.pyi:39: error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/pyspark/mllib/classification.pyi:52: error: Overloaded function signatures 1 and 2 overlap with incompatible return types Found 13 errors in 4 files (checked 314 source files) 1 ``` Jenkins installed mypy at SPARK-32797 but seems the version installed is not same as GIthub Actions. It seems difficult to make the codebase compatible with multiple mypy versions. Therefore, this PR sets the lowerbound. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Jenkins job in this PR should test it out. Also manually tested: Without mypy: ``` ... flake8 checks passed. The mypy command was not found. Skipping for now. ``` With mypy 0.812: ``` ... flake8 checks passed. The minimum mypy version needs to be 0.910. Your current version is mypy 0.812. Skipping for now. ``` With mypy 0.910: ``` ... flake8 checks passed. starting mypy test... mypy checks passed. all lint-python tests passed! ``` Closes #33487 from HyukjinKwon/SPARK-36268. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `d6bc8cd681`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-07-23 12:28:28 +09:00
Dongjoon Hyun	60566f9d8e	[SPARK-36262][BUILD] Upgrade ZSTD-JNI to 1.5.0-4 ### What changes were proposed in this pull request? This PR aims to upgrade ZSTD-JNI to 1.5.0-4. ### Why are the changes needed? ZSTD-JNI 1.5.0-3 has a packaging issue. 1.5.0-4 is recommended to be used instead. - https://github.com/luben/zstd-jni/issues/181#issuecomment-885138495 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #33483 from dongjoon-hyun/SPARK-36262. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `a1a197403b`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-22 14:04:14 -07:00
Hyukjin Kwon	d01e53208b	[SPARK-36251][INFRA][BUILD][3.2] Cover GitHub Actions runs without SHA in testing script ### What changes were proposed in this pull request? This PR partially backports the fix in the script at https://github.com/apache/spark/pull/33410 to make the branch-3.2 build pass at https://github.com/apache/spark/actions/workflows/build_and_test.yml?query=event%3Aschedule ### Why are the changes needed? To make the Scala 2.13 periodical job pass ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It is a logically non-conflicting backport. Closes #33472 from HyukjinKwon/SPARK-36251. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-07-22 11:47:36 +09:00
Kousuke Saruta	fef7bf9fcc	[SPARK-36244][BUILD] Upgrade zstd-jni to 1.5.0-3 to avoid a bug about buffer size calculation ### What changes were proposed in this pull request? This PR upgrades `zstd-jni` from `1.5.0-2` to `1.5.0-3`. `1.5.0-3` was released few days ago. This release resolves an issue about buffer size calculation, which can affect usage in Spark. https://github.com/luben/zstd-jni/releases/tag/v1.5.0-3 ### Why are the changes needed? It might be a corner case that skipping length is greater than `2^31 - 1` but it's possible to affect Spark. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI. Closes #33464 from sarutak/upgrade-zstd-jni-1.5.0-3. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `dcb7db5370`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-21 19:37:18 -07:00
Kousuke Saruta	57794d3ec9	[SPARK-36166][TESTS][FOLLOWUP] Add BLOCK_SCALA_VERSION to sparktestssupport/__init__.py ### What changes were proposed in this pull request? This is a followup PR for SPARK-36166 (#33411), which adds `BLOCK_SCALA_VERSION` to `sparktestssupport/__init__.py`. ### Why are the changes needed? The following command fails due to the definition is missing. ``` SCALA_PROFILE=scala2.12 dev/run-tests.py ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The command shown above works. Closes #33421 from sarutak/followup-SPARK-36166. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `c7ccc602db`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-07-19 22:47:14 +09:00
Hyukjin Kwon	2ae77574dc	[SPARK-36166][TESTS][FOLLOW-UP] Add Scala version change logic into testing script ### What changes were proposed in this pull request? This PR is a simple followup from https://github.com/apache/spark/pull/33376: - It simplifies a bit by removing the default Scala version in the testing script (so we don't have to change here in the future when we change the Scala default version). - Call `change-scala-version.sh` script (when `SCALA_PROFILE` is explicitly specified) ### Why are the changes needed? More refactoring. In addition, this change will be used at https://github.com/apache/spark/pull/33410 ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? CI in this PR should test it out. Closes #33411 from HyukjinKwon/SPARK-36166. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `8ee199ef42`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-07-19 18:01:14 +09:00
William Hyun	d5cec45c0b	[SPARK-36198][TESTS] Skip UNIDOC generation in PySpark GHA job ### What changes were proposed in this pull request? This PR aims to skip UNIDOC generation in PySpark GHA job. ### Why are the changes needed? PySpark GHA jobs do not need to generate Java/Scala doc. This will save about 13 minutes in total. -https://github.com/apache/spark/runs/3098268973?check_suite_focus=true ``` ... ======================================================================== Building Unidoc API Documentation ======================================================================== [info] Building Spark unidoc using SBT with these arguments: -Phadoop-3.2 -Phive-2.3 -Pscala-2.12 -Phive-thriftserver -Pmesos -Pdocker-integration-tests -Phive -Pkinesis-asl -Pspark-ganglia-lgpl -Pkubernetes -Phadoop-cloud -Pyarn unidoc ... [info] Main Java API documentation successful. [success] Total time: 192 s (03:12), completed Jul 18, 2021 6:08:40 PM ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GHA. Closes #33407 from williamhyun/SKIP_UNIDOC. Authored-by: William Hyun <william@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `c336f73ccd`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-18 17:52:40 -07:00
Dongjoon Hyun	12c8c89693	[SPARK-36166][TESTS] Support Scala 2.13 test in `dev/run-tests.py` ### What changes were proposed in this pull request? For Apache Spark 3.2, this PR aims to support Scala 2.13 test in `dev/run-tests.py` by adding `SCALA_PROFILE` and in `dev/run-tests-jenkins.py` by adding `AMPLAB_JENKINS_BUILD_SCALA_PROFILE`. In addition, `test-dependencies.sh` is skipped for Scala 2.13 because we currently don't maintain the dependency manifests yet. This will be handled after Apache Spark 3.2.0 release. ### Why are the changes needed? To test Scala 2.13 with `dev/run-tests.py`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual. The following is the result. Note that this PR aims to run Scala 2.13 tests instead of passing them. We will have daily GitHub Action job via #33358 and will fix UT failures if exists. ``` $ dev/change-scala-version.sh 2.13 $ SCALA_PROFILE=scala2.13 dev/run-tests.py ... ======================================================================== Running Scala style checks ======================================================================== [info] Checking Scala style using SBT with these profiles: -Phadoop-3.2 -Phive-2.3 -Pscala-2.13 -Pkubernetes -Phadoop-cloud -Phive -Phive-thriftserver -Pyarn -Pmesos -Pdocker-integration-tests -Pkinesis-asl -Pspark-ganglia-lgpl ... ======================================================================== Building Spark ======================================================================== [info] Building Spark using SBT with these arguments: -Phadoop-3.2 -Phive-2.3 -Pscala-2.13 -Pspark-ganglia-lgpl -Pmesos -Pyarn -Phive-thriftserver -Pkinesis-asl -Pkubernetes -Pdocker-integration-tests -Phive -Phadoop-cloud test:package streaming-kinesis-asl-assembly/assembly ... [info] Building Spark assembly using SBT with these arguments: -Phadoop-3.2 -Phive-2.3 -Pscala-2.13 -Pspark-ganglia-lgpl -Pmesos -Pyarn -Phive-thriftserver -Pkinesis-asl -Pkubernetes -Pdocker-integration-tests -Phive -Phadoop-cloud assembly/package ... ======================================================================== Running Java style checks ======================================================================== [info] Checking Java style using SBT with these profiles: -Phadoop-3.2 -Phive-2.3 -Pscala-2.13 -Pspark-ganglia-lgpl -Pmesos -Pyarn -Phive-thriftserver -Pkinesis-asl -Pkubernetes -Pdocker-integration-tests -Phive -Phadoop-cloud ... ======================================================================== Building Unidoc API Documentation ======================================================================== [info] Building Spark unidoc using SBT with these arguments: -Phadoop-3.2 -Phive-2.3 -Pscala-2.13 -Pspark-ganglia-lgpl -Pmesos -Pyarn -Phive-thriftserver -Pkinesis-asl -Pkubernetes -Pdocker-integration-tests -Phive -Phadoop-cloud unidoc ... ======================================================================== Running Spark unit tests ======================================================================== [info] Running Spark tests using SBT with these arguments: -Phadoop-3.2 -Phive-2.3 -Pscala-2.13 -Pspark-ganglia-lgpl -Pmesos -Pyarn -Phive-thriftserver -Pkinesis-asl -Pkubernetes -Pdocker-integration-tests -Phive -Phadoop-cloud test ... ``` Closes #33376 from dongjoon-hyun/SPARK-36166. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `f66153de78`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-15 19:26:20 -07:00
Dongjoon Hyun	0520497d60	[SPARK-36164][INFRA][FOLLOWUP] Add empty string check back ### What changes were proposed in this pull request? This is a follow-up of #33371. At the branch commit GitHub run, we have an empty environment variable. This PR adds back the empty string check logic. ### Why are the changes needed? Currently, the failure happens when we use `--modules` in GitHub Action. ``` $ GITHUB_ACTIONS=1 APACHE_SPARK_REF= dev/run-tests.py --modules core [info] Using build tool sbt with Hadoop profile hadoop3.2 and Hive profile hive2.3 under environment github_actions fatal: ambiguous argument '': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]' Traceback (most recent call last): File "/Users/dongjoon/APACHE/spark-merge/dev/run-tests.py", line 785, in <module> main() File "/Users/dongjoon/APACHE/spark-merge/dev/run-tests.py", line 663, in main changed_files = identify_changed_files_from_git_commits( File "/Users/dongjoon/APACHE/spark-merge/dev/run-tests.py", line 91, in identify_changed_files_from_git_commits raw_output = subprocess.check_output(['git', 'diff', '--name-only', patch_sha, diff_target], File "/Users/dongjoon/.pyenv/versions/3.9.5/lib/python3.9/subprocess.py", line 424, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/Users/dongjoon/.pyenv/versions/3.9.5/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['git', 'diff', '--name-only', 'HEAD', '']' returned non-zero exit status 128. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually. The following failure is correct in local environment because it passed `identify_changed_files_from_git_commits` already. ``` $ GITHUB_ACTIONS=1 APACHE_SPARK_REF= dev/run-tests.py --modules core [info] Using build tool sbt with Hadoop profile hadoop3.2 and Hive profile hive2.3 under environment github_actions Traceback (most recent call last): File "/Users/dongjoon/APACHE/spark-merge/dev/run-tests.py", line 785, in <module> main() File "/Users/dongjoon/APACHE/spark-merge/dev/run-tests.py", line 668, in main os.environ["GITHUB_SHA"], target_ref=os.environ["GITHUB_PREV_SHA"]) File "/Users/dongjoon/.pyenv/versions/3.9.5/lib/python3.9/os.py", line 679, in __getitem__ raise KeyError(key) from None KeyError: 'GITHUB_SHA' ``` Closes #33374 from dongjoon-hyun/SPARK-36164. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `5f41a2752f`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-15 13:44:27 -07:00
William Hyun	ebc830f14e	[SPARK-36164][INFRA] run-test.py should not fail when APACHE_SPARK_REF is not defined ### What changes were proposed in this pull request? This PR aims to change run-test.py so that it does not fail when os.environ["APACHE_SPARK_REF"] is not defined. ### Why are the changes needed? Currently, the run-test.py ends with an error. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #33371 from williamhyun/SPARK-36164. Authored-by: William Hyun <william@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `c8a3c22628`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-15 11:43:48 -07:00
Hyukjin Kwon	a87a6df2d1	[SPARK-36159][BUILD] Replace 'python' to 'python3' in dev/test-dependencies.sh ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/26330. There is the last place to fix in `dev/test-dependencies.sh` ### Why are the changes needed? To stick to Python 3 instead of using Python 2 mistakenly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested. Closes #33368 from HyukjinKwon/change-python-3. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `6bd385f1e3`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-15 07:58:27 -07:00
Dongjoon Hyun	384bee3663	[SPARK-36150][INFRA][TESTS] Disable MiMa for Scala 2.13 artifacts ### What changes were proposed in this pull request? This PR aims to disable MiMa check for Scala 2.13 artifacts. ### Why are the changes needed? Apache Spark doesn't have Scala 2.13 Maven artifacts yet. SPARK-36151 will enable this after Apache Spark 3.2.0 release. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual. The following should succeed without real testing. ``` $ dev/mima -Pscala-2.13 ``` Closes #33355 from dongjoon-hyun/SPARK-36150. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `5acfecbf97`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-15 00:34:43 -07:00
Kousuke Saruta	ca8d2670b7	[SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs ### What changes were proposed in this pull request? This PR upgrades `commons-compress` from `1.20` to `1.21` to deal with CVEs. ### Why are the changes needed? Some CVEs which affect `commons-compress 1.20` are reported and fixed in `1.21`. https://commons.apache.org/proper/commons-compress/security-reports.html * CVE-2021-35515 * CVE-2021-35516 * CVE-2021-35517 * CVE-2021-36090 The severities are reported as low for all the CVEs but it would be better to deal with them just in case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI. Closes #33333 from sarutak/upgrade-commons-compress-1.21. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `fd06cc211d`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-13 22:53:22 -07:00
Wenchen Fan	c1d8ccfb64	Revert "[SPARK-35253][SPARK-35398][SQL][BUILD] Bump up the janino version to v3.1.4" ### What changes were proposed in this pull request? This PR reverts https://github.com/apache/spark/pull/32455 and its followup https://github.com/apache/spark/pull/32536 , because the new janino version has a bug that is not fixed yet: https://github.com/janino-compiler/janino/pull/148 ### Why are the changes needed? avoid regressions ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #33302 from cloud-fan/revert. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `ae6199af44`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-07-13 12:14:21 +09:00
Yikun Jiang	fd277dc036	[SPARK-36002][PYTHON] Consolidate tests for data-type-based operations of decimal Series ### What changes were proposed in this pull request? Merge test_decimal_ops into test_num_ops - merge test_isnull() into test_num_ops.test_isnull() - remove test_datatype_ops(), which already covered in `11fcbc73cb/python/pyspark/pandas/tests/data_type_ops/test_base.py (L58-L59)` ### Why are the changes needed? Tests for data-type-based operations of decimal Series are in two places: - python/pyspark/pandas/tests/data_type_ops/test_decimal_ops.py - python/pyspark/pandas/tests/data_type_ops/test_num_ops.py We'd better merge test_decimal_ops into test_num_ops. See also [SPARK-36002](https://issues.apache.org/jira/browse/SPARK-36002) . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? unittests passed Closes #33206 from Yikun/SPARK-36002. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit `fdc50f4452`) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-07-09 14:08:23 +09:00
Dongjoon Hyun	d7990943c3	[SPARK-35992][BUILD] Upgrade ORC to 1.6.9 ### What changes were proposed in this pull request? This PR aims to upgrade Apache ORC to 1.6.9. ### Why are the changes needed? This is required to bring ORC-804 in order to fix ORC encryption masking bug. ### Does this PR introduce _any_ user-facing change? No. This is not released yet. ### How was this patch tested? Pass the newly added test case. Closes #33189 from dongjoon-hyun/SPARK-35992. Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `c55b9fd1e0`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-07-02 09:50:00 -07:00
shane knapp	2c94fbc71e	initial commit for skeleton ansible for jenkins worker config ### What changes were proposed in this pull request? this is the skeleton of the ansible used to configure jenkins workers in the riselab/apache spark build system ### Why are the changes needed? they are not needed, but will help the community understand how to build systems to test multiple versions of spark, as well as propose changes that i can integrate in to the "production" riselab repo. since we're sunsetting jenkins by EOY 2021, this will potentially be useful for migrating the build system. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ansible-lint and much wailing and gnashing of teeth. Closes #32178 from shaneknapp/initial-ansible-commit. Lead-authored-by: shane knapp <incomplete@gmail.com> Co-authored-by: shane <incomplete@gmail.com> Signed-off-by: shane knapp <incomplete@gmail.com>	2021-06-30 10:05:27 -07:00
Dongjoon Hyun	b218cc90cf	[SPARK-35948][INFRA] Simplify release scripts by removing Spark 2.4/Java7 parts ### What changes were proposed in this pull request? This PR aims to clean up Spark 2.4 and Java7 code path from the release scripts. ### Why are the changes needed? To simplify the logic. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #33150 from dongjoon-hyun/SPARK-35948. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-06-30 16:24:03 +09:00
Dongjoon Hyun	5312008cca	[SPARK-35947][INFRA] Increase JVM stack size in release-build.sh ### What changes were proposed in this pull request? Like SPARK-35825, this PR aims to increase JVM stack size via `MAVEN_OPTS` in release-build.sh. ### Why are the changes needed? This will mitigate the failure in publishing snapshot GitHub Action job and during the release. - https://github.com/apache/spark/actions/workflows/publish_snapshot.yml (3-day consecutive failures) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #33149 from dongjoon-hyun/SPARK-35947. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-06-30 16:23:13 +09:00
Yuanjian Li	3257a30e53	[SPARK-35784][SS] Implementation for RocksDB instance ### What changes were proposed in this pull request? The implementation for the RocksDB instance, which is used in the RocksDB state store. It plays a role as a handler for the RocksDB instance and RocksDBFileManager. ### Why are the changes needed? Part of the RocksDB state store implementation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New UT added. Closes #32928 from xuanyuanking/SPARK-35784. Authored-by: Yuanjian Li <yuanjian.li@databricks.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>	2021-06-29 17:46:45 -07:00
Takuya UESHIN	1f6e2f55d7	Revert "[SPARK-35721][PYTHON] Path level discover for python unittests" This reverts commit `5db51efa1a`.	2021-06-29 12:08:09 -07:00
Dongjoon Hyun	7e7028282c	[SPARK-35928][BUILD] Upgrade ASM to 9.1 ### What changes were proposed in this pull request? This PR aims to upgrade ASM to 9.1 ### Why are the changes needed? The latest `xbean-asm9-shaded` is built with ASM 9.1. - https://mvnrepository.com/artifact/org.apache.xbean/xbean-asm9-shaded/4.20 - `5e0e3c0c64/pom.xml (L67)` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #33130 from dongjoon-hyun/SPARK-35928. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-06-29 10:27:51 -07:00
Yikun Jiang	5db51efa1a	[SPARK-35721][PYTHON] Path level discover for python unittests ### What changes were proposed in this pull request? Add path level discover for python unittests. ### Why are the changes needed? Now we need to specify the python test cases by manually when we add a new testcase. Sometime, we forgot to add the testcase to module list, the testcase would not be executed. Such as: - pyspark-core pyspark.tests.test_pin_thread Thus we need some auto-discover way to find all testcase rather than specified every case by manually. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add below code in end of `dev/sparktestsupport/modules.py` ```python for m in sorted(all_modules): for g in sorted(m.python_test_goals): print(m.name, g) ``` Compare the result before and after: https://www.diffchecker.com/iO3FvhKL Closes #32867 from Yikun/SPARK_DISCOVER_TEST. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-06-29 17:56:13 +09:00
Dongjoon Hyun	0a7a6f750c	[SPARK-35483][FOLLOWUP][TESTS] Update run-tests.py doctest ### What changes were proposed in this pull request? This PR updates the doctests in `run-tests.py`. ### Why are the changes needed? This should be consists with `modules.py` behavior. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the GitHub Action. I checked manually. ``` $ python dev/run-tests.py Cannot install SparkR as R was not found in PATH [info] Using build tool sbt with Hadoop profile hadoop3.2 and Hive profile hive2.3 under environment local [info] Found the following changed modules: root [info] Setup the following environment variables for tests: ======================================================================== Running Apache RAT checks ======================================================================== RAT checks passed. ``` Closes #33127 from dongjoon-hyun/SPARK-35483-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-06-28 23:14:47 -07:00
Dongjoon Hyun	57896e662e	[SPARK-35483][FOLLOWUP][TESTS] Enable docker_integration_tests for catalyst/sql module changes too ### What changes were proposed in this pull request? This PR aims to enable `docker_integration_tests` when `catalyst` and `sql` module changes additionally. ### Why are the changes needed? Currently, `catalyst` and `sql` module changes do not trigger the JDBC integration test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #33125 from dongjoon-hyun/SPARK-35483. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-06-28 22:59:56 -07:00
Dongjoon Hyun	b999e6bd90	[SPARK-35920][BUILD] Upgrade to Chill 0.10.0 ### What changes were proposed in this pull request? This PR aims to upgrade Chill to 0.10.0. ### Why are the changes needed? This is a maintenance release having cross-compilation to 2.12.14 and 2.13.6 . - https://github.com/twitter/chill/releases/tag/v0.10.0 ### Does this PR introduce _any_ user-facing change? No, this is a dependency change. ### How was this patch tested? Pass the CIs. Closes #33119 from dongjoon-hyun/SPARK-35920. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-06-28 22:06:41 -07:00
Xinrong Meng	5f0113e3a6	[SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark ### What changes were proposed in this pull request? The PR is proposed to support creating a Column of numpy literal value in pandas-on-Spark. It consists of three changes mainly: - Enable the `lit` function defined in `pyspark.pandas.spark.functions` to support numpy literals input. ```py >>> from pyspark.pandas.spark import functions as SF >>> SF.lit(np.int64(1)) Column<'CAST(1 AS BIGINT)'> >>> SF.lit(np.int32(1)) Column<'CAST(1 AS INT)'> >>> SF.lit(np.int8(1)) Column<'CAST(1 AS TINYINT)'> >>> SF.lit(np.byte(1)) Column<'CAST(1 AS TINYINT)'> >>> SF.lit(np.float32(1)) Column<'CAST(1.0 AS FLOAT)'> ``` - Substitute `F.lit` by `SF.lit`, that is, use `lit` function defined in `pyspark.pandas.spark.functions` rather than `lit` function defined in `pyspark.sql.functions` to allow creating columns out of numpy literals. - Enable numpy literals input in `isin` method Non-goal: - Some pandas-on-Spark APIs use PySpark column-related APIs internally, and these column-related APIs don't support numpy literals, thus numpy literals are disallowed as input (e.g. `to_replace` parameter in `replace` API). This PR doesn't aim to adjust all of them. This PR adjusts `isin` only, because the PR is inspired by that (as https://github.com/databricks/koalas/issues/2161). - To complete mappings between all kinds of numpy literals and Spark data types should be a followup task. ### Why are the changes needed? Spark (`lit` function defined in `pyspark.sql.functions`) doesn't support creating a Column out of numpy literal value. So `lit` function defined in `pyspark.pandas.spark.functions` is adjusted in order to support that in pandas-on-Spark. ### Does this PR introduce _any_ user-facing change? Yes. Before: ```py >>> a = ps.DataFrame({'source': [1,2,3,4,5]}) >>> a.source.isin([np.int64(1), np.int64(2)]) Traceback (most recent call last): ... AttributeError: 'numpy.int64' object has no attribute '_get_object_id' ``` After: ```py >>> a = ps.DataFrame({'source': [1,2,3,4,5]}) >>> a.source.isin([np.int64(1), np.int64(2)]) 0 True 1 True 2 False 3 False 4 False Name: source, dtype: bool ``` ### How was this patch tested? Unit tests. Closes #32955 from xinrong-databricks/datatypeops_literal. Authored-by: Xinrong Meng <xinrong.meng@databricks.com> Signed-off-by: Takuya UESHIN <ueshin@databricks.com>	2021-06-28 19:03:42 -07:00
Adam Binford	939ea3d5da	[SPARK-35863][BUILD] Update Ivy to 2.5.0 ### What changes were proposed in this pull request? Update Ivy from 2.4.0 to 2.5.0. - https://ant.apache.org/ivy/history/2.5.0/release-notes.html ### Why are the changes needed? This brings various improvements and bug fixes. Most notably, the adding of `ivy.maven.lookup.sources` and `ivy.maven.lookup.javadoc` configs can significantly speed up module resolution time if these are turned off, especially behind a proxy. These could arguably be turned off by default, because when submitting jobs you probably don't care about the sources or javadoc jars. I didn't include that here but happy to look into if it's desired. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT and build passes Closes #33088 from Kimahriman/feature/ivy-update. Authored-by: Adam Binford <adamq43@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-06-25 07:37:36 -07:00
Wenchen Fan	95ba000279	[SPARK-35872][INFRA] Automatize some steps to finalize the release ### What changes were proposed in this pull request? After the RC vote, the release manager still need to do many work to finalize the release. This PR updates the script the automatize some steps: 1. create the final git tag 2. publish to pypi 3. publish docs to spark-website 4. move the release binaries from dev directory to release directory. 5. update the KEYS file ### Why are the changes needed? easy the work of release manager. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? tested with the recent 3.0.3. Closes #33055 from cloud-fan/release. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2021-06-24 13:25:41 -07:00
yi.wu	1cdc56c70d	[SPARK-35869][INFRA] Fix "Cannot run program python" error from do-release-docker.sh ### What changes were proposed in this pull request? Add `python-is-python3` to `create-release/spark-rm/Dockerfile` ### Why are the changes needed? Systems that use pthon3 by default should explicitly indicate the python version is 3. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested during Apache 3.0.3 release. Closes #33048 from Ngone51/fix-release-script. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-06-24 12:47:28 +09:00
Dongjoon Hyun	0f25cabbc2	[SPARK-35844][INFRA] Add hadoop-cloud profile to PUBLISH_PROFILES ### What changes were proposed in this pull request? This PR aims to add `hadoop-cloud` profile to `PUBLISH_PROFILES` in order to publish `hadoop-cloud` module. Note that this doesn't change `BASE_RELEASE_PROFILES` and there is no change in the binary distributions. ### Why are the changes needed? This is discussed here. - https://lists.apache.org/thread.html/rf87d755460d5ed85c7b6ac0edad48f53c929a2cd287f30be24afd2ad%40%3Cuser.spark.apache.org%3E ### Does this PR introduce _any_ user-facing change? Yes, this will provide `hadoop-cloud` module in Maven Central. ### How was this patch tested? N/A (After merging this, we can check the daily snapshot result) Closes #33003 from dongjoon-hyun/SPARK-35844. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-06-21 11:56:21 -07:00

1 2 3 4 5 ...

1103 commits