ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
William Hyun	753636e86b	[SPARK-31807][INFRA] Use python 3 style in release-build.sh ### What changes were proposed in this pull request? This PR aims to use the style that is compatible with both python 2 and 3. ### Why are the changes needed? This will help python 3 migration. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual. Closes #28632 from williamhyun/use_python3_style. Authored-by: William Hyun <williamhyun3@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-05-25 10:25:43 +09:00
Dongjoon Hyun	3772154442	[SPARK-31691][INFRA] release-build.sh should ignore a fallback output from `build/mvn` ### What changes were proposed in this pull request? This PR adds `i` option to ignore additional `build/mvn` output which is irrelevant to version string. ### Why are the changes needed? SPARK-28963 added additional output message, `Falling back to archive.apache.org to download Maven` in build/mvn. This breaks `dev/create-release/release-build.sh` and currently Spark Packaging Jenkins job is hitting this issue consistently and broken. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2912/console ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This happens only when the mirror fails. So, this is verified manually hiject the script. It works like the following. ``` $ echo 'Falling back to archive.apache.org to download Maven' > out $ build/mvn help:evaluate -Dexpression=project.version >> out Using `mvn` from path: /Users/dongjoon/PRS/SPARK_RELEASE_2/build/apache-maven-3.6.3/bin/mvn $ cat out \| grep -v INFO \| grep -v WARNING \| grep -v Download Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT $ cat out \| grep -v INFO \| grep -v WARNING \| grep -vi Download 3.1.0-SNAPSHOT ``` Closes #28514 from dongjoon-hyun/SPARK_RELEASE_2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-05-12 14:24:56 -07:00
Yuming Wang	5de5e46624	[SPARK-30268][INFRA] Fix incorrect pyspark version when releasing preview versions ### What changes were proposed in this pull request? This PR fix incorrect pyspark version when releasing preview versions. ### Why are the changes needed? Failed to make Spark binary distribution: ``` cp: cannot stat 'spark-3.0.0-preview2-bin-hadoop2.7/python/dist/pyspark-3.0.0.dev02.tar.gz': No such file or directory gpg: can't open 'pyspark-3.0.0.dev02.tar.gz': No such file or directory gpg: signing failed: No such file or directory gpg: pyspark-3.0.0.dev02.tar.gz: No such file or directory ``` ``` yumwangubuntu-3513086:~/spark-release/output$ ll spark-3.0.0-preview2-bin-hadoop2.7/python/dist/ total 214140 drwxr-xr-x 2 yumwang stack 4096 Dec 16 06:17 ./ drwxr-xr-x 9 yumwang stack 4096 Dec 16 06:17 ../ -rw-r--r-- 1 yumwang stack 219267173 Dec 16 06:17 pyspark-3.0.0.dev2.tar.gz ``` ``` /usr/local/lib/python3.6/dist-packages/setuptools/dist.py:476: UserWarning: Normalizing '3.0.0.dev02' to '3.0.0.dev2' normalized_version, ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test: ``` LM-SHC-16502798:spark yumwang$ SPARK_VERSION=3.0.0-preview2 LM-SHC-16502798:spark yumwang$ echo "$SPARK_VERSION" \| sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/" 3.0.0.dev2 ``` Closes #26909 from wangyum/SPARK-30268. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-12-17 10:22:29 +09:00
Dongjoon Hyun	9459833eae	[SPARK-29989][INFRA] Add `hadoop-2.7/hive-2.3` pre-built distribution ### What changes were proposed in this pull request? This PR aims to add another pre-built binary distribution with `-Phadoop-2.7 -Phive-1.2` at `Apache Spark 3.0.0`. PRE-BUILT BINARY DISTRIBUTION ``` spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz.asc spark-3.0.0-SNAPSHOT-bin-hadoop2.7-hive1.2.tgz.sha512 ``` CONTENTS (snippet) ``` $ ls hadoop- hadoop-annotations-2.7.4.jar hadoop-mapreduce-client-shuffle-2.7.4.jar hadoop-auth-2.7.4.jar hadoop-yarn-api-2.7.4.jar hadoop-client-2.7.4.jar hadoop-yarn-client-2.7.4.jar hadoop-common-2.7.4.jar hadoop-yarn-common-2.7.4.jar hadoop-hdfs-2.7.4.jar hadoop-yarn-server-common-2.7.4.jar hadoop-mapreduce-client-app-2.7.4.jar hadoop-yarn-server-web-proxy-2.7.4.jar hadoop-mapreduce-client-common-2.7.4.jar parquet-hadoop-1.10.1.jar hadoop-mapreduce-client-core-2.7.4.jar parquet-hadoop-bundle-1.6.0.jar hadoop-mapreduce-client-jobclient-2.7.4.jar $ ls hive- hive-beeline-1.2.1.spark2.jar hive-jdbc-1.2.1.spark2.jar hive-cli-1.2.1.spark2.jar hive-metastore-1.2.1.spark2.jar hive-exec-1.2.1.spark2.jar spark-hive-thriftserver_2.12-3.0.0-SNAPSHOT.jar ``` ### Why are the changes needed? Since Apache Spark switched to use `-Phive-2.3` by default, all pre-built binary distribution will use `-Phive-2.3`. This PR adds `hadoop-2.7/hive-1.2` distribution to provide a similar combination like `Apache Spark 2.4` line. ### Does this PR introduce any user-facing change? Yes. This is additional distribution which resembles to `Apache Spark 2.4` line in terms of `hive` version. ### How was this patch tested? Manual. Please note that we need a dry-run mode, but the AS-IS release script do not generate additional combinations including this in `dry-run` mode. Closes #26688 from dongjoon-hyun/SPARK-29989. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>	2019-11-27 15:55:52 -08:00
Dongjoon Hyun	a60da23d64	[SPARK-30007][INFRA] Publish snapshot/release artifacts with `-Phive-2.3` only ### What changes were proposed in this pull request? This PR aims to add `-Phive-2.3` to publish profiles. Since Apache Spark 3.0.0, Maven artifacts will be publish with Apache Hive 2.3 profile only. This PR also will recover `SNAPSHOT` publishing Jenkins job. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/ We will provide the pre-built distributions (with Hive 1.2.1 also) like Apache Spark 2.4. SPARK-29989 will update the release script to generate all combinations. ### Why are the changes needed? This will reduce the explicit dependency on the illegitimate Hive fork in Maven repository. ### Does this PR introduce any user-facing change? Yes, but this is dev only changes. ### How was this patch tested? Manual. Closes #26648 from dongjoon-hyun/SPARK-30007. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-11-23 22:34:21 -08:00
Xingbo Jiang	155a67d00c	[SPARK-29666][BUILD] Fix the publish release failure under dry-run mode ### What changes were proposed in this pull request? `release-build.sh` fail to publish release under dry run mode with the following error message: ``` /opt/spark-rm/release-build.sh: line 429: pushd: spark-repo-g4MBm/org/apache/spark: No such file or directory ``` We need to at least run the `mvn clean install` command once to create the `$tmp_repo` path, but now those steps are all skipped under dry-run mode. This PR fixes the issue. ### How was this patch tested? Tested locally. Closes #26329 from jiangxb1987/dryrun. Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-30 14:57:51 -07:00
Xingbo Jiang	fd6cfb1be3	[SPARK-29646][BUILD] Allow pyspark version name format `${versionNumber}-preview` in release script ### What changes were proposed in this pull request? Update `release-build.sh`, to allow pyspark version name format `${versionNumber}-preview`, otherwise the release script won't generate pyspark release tarballs. ### How was this patch tested? Tested locally. Closes #26306 from jiangxb1987/buildPython. Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-30 14:51:50 -07:00
Dongjoon Hyun	2baf7a1d8f	[SPARK-29608][BUILD] Add `hadoop-3.2` profile to release build ### What changes were proposed in this pull request? This PR aims to add `hadoop-3.2` profile to pre-built binary package releases. ### Why are the changes needed? Since Apache Spark 3.0.0, we provides Hadoop 3.2 pre-built binary. ### Does this PR introduce any user-facing change? No. (Although the artifacts are available, this change is for release managers). ### How was this patch tested? Manual. Please note that `DRY_RUN=0` disables these combination. ``` $ dev/create-release/release-build.sh package ... Packages to build: without-hadoop hadoop3.2 hadoop2.7 make_binary_release without-hadoop -Pscala-2.12 -Phadoop-provided 2.12 make_binary_release hadoop3.2 -Pscala-2.12 -Phadoop-3.2 -Phive -Phive-thriftserver 2.12 make_binary_release hadoop2.7 -Pscala-2.12 -Phadoop-2.7 -Phive -Phive-thriftserver withpip,withr 2.12 ``` Closes #26260 from dongjoon-hyun/SPARK-29608. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-10-25 13:57:26 -07:00
Kazuaki Ishizaki	8d1b5ba766	[SPARK-28906][BUILD] Fix incorrect information in bin/spark-submit --version ### What changes were proposed in this pull request? This PR allows `bin/spark-submit --version` to show the correct information while the previous versions, which were created by `dev/create-release/do-release-docker.sh`, show incorrect information. There are two root causes to show incorrect information: 1. Did not pass `USER` environment variable to the docker container 1. Did not keep `.git` directory in the work directory ### Why are the changes needed? The information is missing while the previous versions show the correct information. ### Does this PR introduce any user-facing change? Yes, the following is the console output in branch-2.3 ``` $ bin/spark-submit --version Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.4 /_/ Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_212 Branch HEAD Compiled by user ishizaki on 2019-09-02T02:18:10Z Revision `8c6f8150f3` Url https://gitbox.apache.org/repos/asf/spark.git Type --help for more information. ``` Without this PR, the console output is as follows ``` $ spark-submit --version Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.4 /_/ Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_212 Branch Compiled by user on 2019-08-26T08:29:39Z Revision Url Type --help for more information. ``` ### How was this patch tested? After building the package, I manually executed `bin/spark-submit --version` Closes #25655 from kiszk/SPARK-28906. Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-09-11 08:12:44 -05:00
Sean Owen	8bc304f97e	[SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0 ## What changes were proposed in this pull request? Remove Scala 2.11 support in build files and docs, and in various parts of code that accommodated 2.11. See some targeted comments below. ## How was this patch tested? Existing tests. Closes #23098 from srowen/SPARK-26132. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-25 10:46:42 -05:00
DB Tsai	2b9ad2516e	[MINOR][BUILD] Add Scala 2.12 profile back for branch-2.4 build Closes #24074 from dbtsai/scala-2.12. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-12 20:08:52 -07:00
DB Tsai	b6375097bc	[SPARK-27026][BUILD] Upgrade Docker image for release build to Ubuntu 18.04 LTS ## What changes were proposed in this pull request? Upgrade Docker image for release build to Ubuntu 18.04LTS ## How was this patch tested? Manually tested. Closes #23932 from dbtsai/ubuntu18.04. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-06 13:58:21 -08:00
Marcelo Vanzin	d00eca75b3	[SPARK-26048][BUILD] Enable flume profile when creating 2.x releases. Closes #23931 from vanzin/SPARK-26048. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2019-03-02 08:14:06 -08:00
Takeshi Yamamuro	abc937b247	[MINOR][BUILD] Remove binary license/notice files in a source release for branch-2.4+ only ## What changes were proposed in this pull request? To skip some steps to remove binary license/notice files in a source release for branch2.3 (these files only exist in master/branch-2.4 now), this pr checked a Spark release version in `dev/create-release/release-build.sh`. ## How was this patch tested? Manually checked. Closes #23538 from maropu/FixReleaseScript. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-01-14 19:17:39 -06:00
Wenchen Fan	327456b482	[BUILD][MINOR] release script should not interrupt by svn ## What changes were proposed in this pull request? When running the release script, you will be interrupted unexpectedly ``` ATTENTION! Your password for authentication realm: <https://dist.apache.org:443> ASF Committers can only be stored to disk unencrypted! You are advised to configure your system so that Subversion can store passwords encrypted, if possible. See the documentation for details. You can avoid future appearances of this warning by setting the value of the 'store-plaintext-passwords' option to either 'yes' or 'no' in '/home/spark-rm/.subversion/servers'. ----------------------------------------------------------------------- Store password unencrypted (yes/no)? ``` We can avoid it by adding `--no-auth-cache` when running svn command. ## How was this patch tested? manually verified with 2.4.0 RC5 Closes #22885 from cloud-fan/svn. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2018-10-30 21:17:40 +08:00
Wenchen Fan	ac586bbb01	fix security issue of zinc(simplier version)	2018-10-19 23:54:15 +08:00
Sean Owen	703e6da1ec	[SPARK-25705][BUILD][STREAMING][TEST-MAVEN] Remove Kafka 0.8 integration ## What changes were proposed in this pull request? Remove Kafka 0.8 integration ## How was this patch tested? Existing tests, build scripts Closes #22703 from srowen/SPARK-25705. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-10-16 09:10:24 -05:00
Sean Owen	a001814189	[SPARK-25598][STREAMING][BUILD][TEST-MAVEN] Remove flume connector in Spark 3 ## What changes were proposed in this pull request? Removes all vestiges of Flume in the build, for Spark 3. I don't think this needs Jenkins config changes. ## How was this patch tested? Existing tests. Closes #22692 from srowen/SPARK-25598. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-10-11 14:28:06 -07:00
Sean Owen	80813e1980	[SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6 ## What changes were proposed in this pull request? Remove Hadoop 2.6 references and make 2.7 the default. Obviously, this is for master/3.0.0 only. After this we can also get rid of the separate test jobs for Hadoop 2.6. ## How was this patch tested? Existing tests Closes #22615 from srowen/SPARK-25016. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-10-10 12:07:53 -07:00
Gengliang Wang	5534a3a58e	[SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for publishing scala-2.12 build ## What changes were proposed in this pull request? This is a follow up for #22441. 1. Remove flag "-Pkafka-0-8" for Scala 2.12 build. 2. Clean up the script, simpler logic. 3. Switch to Scala version to 2.11 before script exit. ## How was this patch tested? Manual test. Closes #22454 from gengliangwang/revise_release_build. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2018-09-19 18:30:46 +08:00
Wenchen Fan	1c0423b287	[SPARK-25445][BUILD] the release script should be able to publish a scala-2.12 build ## What changes were proposed in this pull request? update the package and publish steps, to support scala 2.12 ## How was this patch tested? manual test Closes #22441 from cloud-fan/scala. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2018-09-18 22:29:00 +08:00
Wenchen Fan	0f1413e320	[SPARK-25443][BUILD] fix issues when building docs with release scripts in docker ## What changes were proposed in this pull request? These 2 changes are required to build the docs for Spark 2.4.0 RC1: 1. install `mkdocs` in the docker image 2. set locale to C.UTF-8. Otherwise jekyll fails to build the doc. ## How was this patch tested? tested manually when doing the 2.4.0 RC1 Closes #22438 from cloud-fan/infra. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2018-09-18 10:10:20 +08:00
Sean Owen	30aa37fca4	[SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and NOTICE, and specialize for source vs binary ## What changes were proposed in this pull request? Fix location of licenses-binary in binary release, and remove binary items from source release ## How was this patch tested? N/A Closes #22436 from srowen/SPARK-24654.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-09-17 08:54:44 -05:00
jerryshao	b66e14dc96	[SPARK-24685][BUILD][FOLLOWUP] Fix the nonexist profile name in release script ## What changes were proposed in this pull request? `without-hadoop` profile doesn't exist in Maven, instead the name should be `hadoop-provided`, this is a regression introduced by SPARK-24685. So here fix it. ## How was this patch tested? Local test. Closes #22434 from jerryshao/SPARK-24685-followup. Authored-by: jerryshao <sshao@hortonworks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2018-09-17 15:21:18 +08:00
Marcelo Vanzin	717f58e9ce	[SPARK-24685][BUILD] Restore support for building old Hadoop versions of 2.1. Update the release scripts to build binary packages for older versions of Hadoop when building Spark 2.1. Also did some minor refactoring of that part of the script so that changing these later is easier. This was used to build the missing packages from 2.1.3-rc2. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #21661 from vanzin/SPARK-24685.	2018-08-15 14:42:48 -07:00
Marcelo Vanzin	4e7d8678a3	[SPARK-24372][BUILD] Add scripts to help with preparing releases. The "do-release.sh" script asks questions about the RC being prepared, trying to find out as much as possible automatically, and then executes the existing scripts with proper arguments to prepare the release. This script was used to prepare the 2.3.1 release candidates, so was tested in that context. The docker version runs that same script inside a docker image especially crafted for building Spark releases. That image is based on the work by Felix C. linked in the bug. At this point is has been only midly tested. I also added a template for the vote e-mail, with placeholders for things that need to be replaced, although there is no automation around that for the moment. It shouldn't be hard to hook up certain things like version and tags to this, or to figure out certain things like the repo URL from the output of the release scripts. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #21515 from vanzin/SPARK-24372.	2018-06-22 12:38:34 -05:00
Marcelo Vanzin	8e60a16b73	[SPARK-23601][BUILD][FOLLOW-UP] Keep md5 checksums for nexus artifacts. The repository.apache.org server still requires md5 checksums or it won't publish the staging repo. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #21338 from vanzin/SPARK-23601.	2018-05-16 13:34:54 -07:00
Sean Owen	8bceb899dc	[SPARK-23601][BUILD] Remove .md5 files from release ## What changes were proposed in this pull request? Remove .md5 files from release artifacts ## How was this patch tested? N/A Author: Sean Owen <sowen@cloudera.com> Closes #20737 from srowen/SPARK-23601.	2018-03-06 08:52:28 -06:00
foxish	c3548d11c3	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) ## What changes were proposed in this pull request? Including the `-Pkubernetes` flag in a few places it was missed. ## How was this patch tested? checkstyle, mima through manual tests. Author: foxish <ramanathana@google.com> Closes #20256 from foxish/SPARK-23063.	2018-01-13 21:34:28 -08:00
Felix Cheung	ab1b6ee731	[BUILD] update release scripts ## What changes were proposed in this pull request? Change to dist.apache.org instead of home directory sha512 should have .sha512 extension. From ASF release signing doc: "The checksum SHOULD be generated using SHA-512. A .sha file SHOULD contain a SHA-1 checksum, for historical reasons." NOTE: I think should require some changes to work with Jenkins' release build ## How was this patch tested? manually Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #19754 from felixcheung/releasescript.	2017-12-09 09:28:46 -06:00
hyukjinkwon	c8b7f97b8a	[SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not exists in release-build.sh ## What changes were proposed in this pull request? This PR proposes to use `/usr/sbin/lsof` if `lsof` is missing in the path to fix nightly snapshot jenkins jobs. Please refer https://github.com/apache/spark/pull/19359#issuecomment-340139557: > Looks like some of the snapshot builds are having lsof issues: > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.1-maven-snapshots/182/console > >https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.2-maven-snapshots/134/console > >spark-build/dev/create-release/release-build.sh: line 344: lsof: command not found >usage: kill [ -s signal \| -p ] [ -a ] pid ... >kill -l [ signal ] Up to my knowledge, the full path of `lsof` is required for non-root user in few OSs. ## How was this patch tested? Manually tested as below: ```bash #!/usr/bin/env bash LSOF=lsof if ! hash $LSOF 2>/dev/null; then echo "a" LSOF=/usr/sbin/lsof fi $LSOF -P \| grep "a" ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes #19695 from HyukjinKwon/SPARK-22377.	2017-11-14 08:28:13 +09:00
Sean Owen	0c03297bf0	[SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile, take 2 ## What changes were proposed in this pull request? Move flume behind a profile, take 2. See https://github.com/apache/spark/pull/19365 for most of the back-story. This change should fix the problem by removing the examples module dependency and moving Flume examples to the module itself. It also adds deprecation messages, per a discussion on dev about deprecating for 2.3.0. ## How was this patch tested? Existing tests, which still enable flume integration. Author: Sean Owen <sowen@cloudera.com> Closes #19412 from srowen/SPARK-22142.2.	2017-10-06 15:08:28 +01:00
gatorsmile	472864014c	Revert "[SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile" This reverts commit `a2516f41ae`.	2017-09-29 11:45:58 -07:00
Holden Karau	ecbe416ab5	[SPARK-22129][SPARK-22138] Release script improvements ## What changes were proposed in this pull request? Use the GPG_KEY param, fix lsof to non-hardcoded path, remove version swap since it wasn't really needed. Use EXPORT on JAVA_HOME for downstream scripts as well. ## How was this patch tested? Rolled 2.1.2 RC2 Author: Holden Karau <holden@us.ibm.com> Closes #19359 from holdenk/SPARK-22129-fix-signing.	2017-09-29 08:04:14 -07:00
Sean Owen	a2516f41ae	[SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile ## What changes were proposed in this pull request? Add 'flume' profile to enable Flume-related integration modules ## How was this patch tested? Existing tests; no functional change Author: Sean Owen <sowen@cloudera.com> Closes #19365 from srowen/SPARK-22142.	2017-09-29 08:26:53 +01:00
Holden Karau	8f130ad401	[SPARK-22072][SPARK-22071][BUILD] Improve release build scripts ## What changes were proposed in this pull request? Check JDK version (with javac) and use SPARK_VERSION for publish-release ## How was this patch tested? Manually tried local build with wrong JDK / JAVA_HOME & built a local release (LFTP disabled) Author: Holden Karau <holden@us.ibm.com> Closes #19312 from holdenk/improve-release-scripts-r2.	2017-09-22 00:14:57 -07:00
Sean Owen	4fbf748bf8	[SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behind a profile ## What changes were proposed in this pull request? Put Kafka 0.8 support behind a kafka-0-8 profile. ## How was this patch tested? Existing tests, but, until PR builder and Jenkins configs are updated the effect here is to not build or test Kafka 0.8 support at all. Author: Sean Owen <sowen@cloudera.com> Closes #19134 from srowen/SPARK-21893.	2017-09-13 10:10:40 +01:00
Sean Owen	12ab7f7e89	[SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation …build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure ## What changes were proposed in this pull request? This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts. In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11. It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the _latest_ maintenance release in the _earliest_ viable minor release. - Scalatest 2.x -> 3.0.3 - Chill 0.8.0 -> 0.8.4 - Clapper 1.0.x -> 1.1.2 - json4s 3.2.x -> 3.4.2 - Jackson 2.6.x -> 2.7.9 (required by json4s) This change does _not_ fully enable a Scala 2.12 build: - It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here - It will require recreating `SparkILoop` and `Main` for REPL 2.12, which is SPARK-14650. Possible to do here too. What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build. ## How was this patch tested? Existing tests and build. Manually tested with `./dev/change-scala-version.sh 2.12` to verify it compiles, modulo the exceptions above. Author: Sean Owen <sowen@cloudera.com> Closes #18645 from srowen/SPARK-14280.	2017-09-01 19:21:21 +01:00
Sean Owen	425c4ada4c	[SPARK-19810][BUILD][CORE] Remove support for Scala 2.10 ## What changes were proposed in this pull request? - Remove Scala 2.10 build profiles and support - Replace some 2.10 support in scripts with commented placeholders for 2.12 later - Remove deprecated API calls from 2.10 support - Remove usages of deprecated context bounds where possible - Remove Scala 2.10 workarounds like ScalaReflectionLock - Other minor Scala warning fixes ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #17150 from srowen/SPARK-19810.	2017-07-13 17:06:24 +08:00
Holden Karau	1b85bcd929	[SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python version ## What changes were proposed in this pull request? Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar. ## How was this patch tested? Ran `make-distribution` locally Author: Holden Karau <holden@us.ibm.com> Closes #17885 from holdenk/SPARK-20627-remove-pip-local-version-string.	2017-05-09 11:25:29 -07:00
Josh Rosen	314cf51ded	[SPARK-20102] Fix nightly packaging and RC packaging scripts w/ two minor build fixes ## What changes were proposed in this pull request? The master snapshot publisher builds are currently broken due to two minor build issues: 1. For unknown reasons, the LFTP `mkdir -p` command began throwing errors when the remote directory already exists. This change of behavior might have been caused by configuration changes in the ASF's SFTP server, but I'm not entirely sure of that. To work around this problem, this patch updates the script to ignore errors from the `lftp mkdir -p` commands. 2. The PySpark `setup.py` file references a non-existent `pyspark.ml.stat` module, causing Python packaging to fail by complaining about a missing directory. The fix is to simply drop that line from the setup script. ## How was this patch tested? The LFTP fix was tested by manually running the failing commands on AMPLab Jenkins against the ASF SFTP server. The PySpark fix was tested locally. Author: Josh Rosen <joshrosen@databricks.com> Closes #17437 from JoshRosen/spark-20102.	2017-03-27 10:23:28 -07:00
Sean Owen	0e2405490f	[SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support - Move external/java8-tests tests into core, streaming, sql and remove - Remove MaxPermGen and related options - Fix some reflection / TODOs around Java 8+ methods - Update doc references to 1.7/1.8 differences - Remove Java 7/8 related build profiles - Update some plugins for better Java 8 compatibility - Fix a few Java-related warnings For the future: - Update Java 8 examples to fully use Java 8 - Update Java tests to use lambdas for simplicity - Update Java internal implementations to use lambdas ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #16871 from srowen/SPARK-19493.	2017-02-16 12:32:45 +00:00
Sean Owen	e8d3fca450	[SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier ## What changes were proposed in this pull request? - Remove support for Hadoop 2.5 and earlier - Remove reflection and code constructs only needed to support multiple versions at once - Update docs to reflect newer versions - Remove older versions' builds and profiles. ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #16810 from srowen/SPARK-19464.	2017-02-08 12:20:07 +00:00
Shivaram Venkataraman	be5fc6ef72	[MINOR][SPARKR] Fix SparkR regex in copy command Fix SparkR package copy regex. The existing code leads to ``` Copying release tarballs to /home/***/public_html/spark-nightly/spark-branch-2.1-bin/spark-2.1.1-SNAPSHOT-2016_12_08_22_38-e8f351f-bin mput: SparkR-: no files found ``` Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #16231 from shivaram/typo-sparkr-build.	2016-12-09 10:12:56 -08:00
Felix Cheung	c074c96dc5	Copy pyspark and SparkR packages to latest release dir too ## What changes were proposed in this pull request? Copy pyspark and SparkR packages to latest release dir, as per comment [here](https://github.com/apache/spark/pull/16226#discussion_r91664822) Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16227 from felixcheung/pyrftp.	2016-12-08 22:52:34 -08:00
Shivaram Venkataraman	934035ae7c	Copy the SparkR source package with LFTP This PR adds a line in release-build.sh to copy the SparkR source archive using LFTP Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #16226 from shivaram/fix-sparkr-copy-build.	2016-12-08 22:21:24 -08:00
Shivaram Venkataraman	202fcd21ce	[SPARK-18590][SPARKR] Change the R source build to Hadoop 2.6 This PR changes the SparkR source release tarball to be built using the Hadoop 2.6 profile. Previously it was using the without hadoop profile which leads to an error as discussed in https://github.com/apache/spark/pull/16014#issuecomment-265843991 Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #16218 from shivaram/fix-sparkr-release-build.	2016-12-08 13:01:46 -08:00
Felix Cheung	c3d3a9d0e8	[SPARK-18590][SPARKR] build R source package when making distribution ## What changes were proposed in this pull request? This PR has 2 key changes. One, we are building source package (aka bundle package) for SparkR which could be released on CRAN. Two, we should include in the official Spark binary distributions SparkR installed from this source package instead (which would have help/vignettes rds needed for those to work when the SparkR package is loaded in R, whereas earlier approach with devtools does not) But, because of various differences in how R performs different tasks, this PR is a fair bit more complicated. More details below. This PR also includes a few minor fixes. ### more details These are the additional steps in make-distribution; please see [here](https://github.com/apache/spark/blob/master/R/CRAN_RELEASE.md) on what's going to a CRAN release, which is now run during make-distribution.sh. 1. package needs to be installed because the first code block in vignettes is `library(SparkR)` without lib path 2. `R CMD build` will build vignettes (this process runs Spark/SparkR code and captures outputs into pdf documentation) 3. `R CMD check` on the source package will install package and build vignettes again (this time from source packaged) - this is a key step required to release R package on CRAN (will skip tests here but tests will need to pass for CRAN release process to success - ideally, during release signoff we should install from the R source package and run tests) 4. `R CMD Install` on the source package (this is the only way to generate doc/vignettes rds files correctly, not in step # 1) (the output of this step is what we package into Spark dist and sparkr.zip) Alternatively, R CMD build should already be installing the package in a temp directory though it might just be finding this location and set it to lib.loc parameter; another approach is perhaps we could try calling `R CMD INSTALL --build pkg` instead. But in any case, despite installing the package multiple times this is relatively fast. Building vignettes takes a while though. ## How was this patch tested? Manually, CI. Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16014 from felixcheung/rdist.	2016-12-08 11:29:31 -08:00
Reynold Xin	37e52f8793	[SPARK-18639] Build only a single pip package ## What changes were proposed in this pull request? We current build 5 separate pip binary tar balls, doubling the release script runtime. It'd be better to build one, especially for use cases that are just using Spark locally. In the long run, it would make more sense to have Hadoop support be pluggable. ## How was this patch tested? N/A - this is a release build script that doesn't have any automated test coverage. We will know if it goes wrong when we prepare releases. Author: Reynold Xin <rxin@databricks.com> Closes #16072 from rxin/SPARK-18639.	2016-12-01 17:58:28 -08:00
Holden Karau	a36a76ac43	[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed ## What changes were proposed in this pull request? This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129). Done: - pip installable on conda [manual tested] - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested] - Automated testing of this (virtualenv) - packaging and signing with release-build* Possible follow up work: - release-build update to publish to PyPI (SPARK-18128) - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?) - Windows support and or testing ( SPARK-18136 ) - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test - consider how we want to number our dev/snapshot versions Explicitly out of scope: - Using pip installed PySpark to start a standalone cluster - Using pip installed PySpark for non-Python Spark programs *I've done some work to test release-build locally but as a non-committer I've just done local testing. ## How was this patch tested? Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration. release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites) Author: Holden Karau <holden@us.ibm.com> Author: Juliet Hougland <juliet@cloudera.com> Author: Juliet Hougland <not@myemail.com> Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.	2016-11-16 14:22:15 -08:00

1 2

62 commits