67ed0aa0fd
Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons.
Changes proposed by vanzin resulting from previous pull-request https://github.com/apache/spark/pull/5783 that did not fixed the problem correctly.
Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned.
Author: FavioVazquez <favio.vazquezp@gmail.com>
Closes #5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits:
11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh
379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior
3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies
31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation
cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies
83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml
93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM
668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM
fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml
0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0
a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml
199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that.
88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file
70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles
287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc.
1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation
6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff.
7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)
(cherry picked from commit 7fb715de6d
)
Signed-off-by: Sean Owen <sowen@cloudera.com>
268 lines
10 KiB
Bash
Executable file
268 lines
10 KiB
Bash
Executable file
#!/usr/bin/env bash
|
|
|
|
#
|
|
# Licensed to the Apache Software Foundation (ASF) under one or more
|
|
# contributor license agreements. See the NOTICE file distributed with
|
|
# this work for additional information regarding copyright ownership.
|
|
# The ASF licenses this file to You under the Apache License, Version 2.0
|
|
# (the "License"); you may not use this file except in compliance with
|
|
# the License. You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
#
|
|
|
|
# Quick-and-dirty automation of making maven and binary releases. Not robust at all.
|
|
# Publishes releases to Maven and packages/copies binary release artifacts.
|
|
# Expects to be run in a totally empty directory.
|
|
#
|
|
# Options:
|
|
# --skip-create-release Assume the desired release tag already exists
|
|
# --skip-publish Do not publish to Maven central
|
|
# --skip-package Do not package and upload binary artifacts
|
|
# Would be nice to add:
|
|
# - Send output to stderr and have useful logging in stdout
|
|
|
|
# Note: The following variables must be set before use!
|
|
ASF_USERNAME=${ASF_USERNAME:-pwendell}
|
|
ASF_PASSWORD=${ASF_PASSWORD:-XXX}
|
|
GPG_PASSPHRASE=${GPG_PASSPHRASE:-XXX}
|
|
GIT_BRANCH=${GIT_BRANCH:-branch-1.0}
|
|
RELEASE_VERSION=${RELEASE_VERSION:-1.2.0}
|
|
# Allows publishing under a different version identifier than
|
|
# was present in the actual release sources (e.g. rc-X)
|
|
PUBLISH_VERSION=${PUBLISH_VERSION:-$RELEASE_VERSION}
|
|
NEXT_VERSION=${NEXT_VERSION:-1.2.1}
|
|
RC_NAME=${RC_NAME:-rc2}
|
|
|
|
M2_REPO=~/.m2/repository
|
|
SPARK_REPO=$M2_REPO/org/apache/spark
|
|
NEXUS_ROOT=https://repository.apache.org/service/local/staging
|
|
NEXUS_PROFILE=d63f592e7eac0 # Profile for Spark staging uploads
|
|
|
|
if [ -z "$JAVA_HOME" ]; then
|
|
echo "Error: JAVA_HOME is not set, cannot proceed."
|
|
exit -1
|
|
fi
|
|
JAVA_7_HOME=${JAVA_7_HOME:-$JAVA_HOME}
|
|
|
|
set -e
|
|
|
|
GIT_TAG=v$RELEASE_VERSION-$RC_NAME
|
|
|
|
if [[ ! "$@" =~ --skip-create-release ]]; then
|
|
echo "Creating release commit and publishing to Apache repository"
|
|
# Artifact publishing
|
|
git clone https://$ASF_USERNAME:$ASF_PASSWORD@git-wip-us.apache.org/repos/asf/spark.git \
|
|
-b $GIT_BRANCH
|
|
pushd spark
|
|
export MAVEN_OPTS="-Xmx3g -XX:MaxPermSize=1g -XX:ReservedCodeCacheSize=1g"
|
|
|
|
# Create release commits and push them to github
|
|
# NOTE: This is done "eagerly" i.e. we don't check if we can succesfully build
|
|
# or before we coin the release commit. This helps avoid races where
|
|
# other people add commits to this branch while we are in the middle of building.
|
|
cur_ver="${RELEASE_VERSION}-SNAPSHOT"
|
|
rel_ver="${RELEASE_VERSION}"
|
|
next_ver="${NEXT_VERSION}-SNAPSHOT"
|
|
|
|
old="^\( \{2,4\}\)<version>${cur_ver}<\/version>$"
|
|
new="\1<version>${rel_ver}<\/version>"
|
|
find . -name pom.xml | grep -v dev | xargs -I {} sed -i \
|
|
-e "s/${old}/${new}/" {}
|
|
find . -name package.scala | grep -v dev | xargs -I {} sed -i \
|
|
-e "s/${old}/${new}/" {}
|
|
|
|
git commit -a -m "Preparing Spark release $GIT_TAG"
|
|
echo "Creating tag $GIT_TAG at the head of $GIT_BRANCH"
|
|
git tag $GIT_TAG
|
|
|
|
old="^\( \{2,4\}\)<version>${rel_ver}<\/version>$"
|
|
new="\1<version>${next_ver}<\/version>"
|
|
find . -name pom.xml | grep -v dev | xargs -I {} sed -i \
|
|
-e "s/$old/$new/" {}
|
|
find . -name package.scala | grep -v dev | xargs -I {} sed -i \
|
|
-e "s/${old}/${new}/" {}
|
|
git commit -a -m "Preparing development version $next_ver"
|
|
git push origin $GIT_TAG
|
|
git push origin HEAD:$GIT_BRANCH
|
|
popd
|
|
rm -rf spark
|
|
fi
|
|
|
|
if [[ ! "$@" =~ --skip-publish ]]; then
|
|
git clone https://$ASF_USERNAME:$ASF_PASSWORD@git-wip-us.apache.org/repos/asf/spark.git
|
|
pushd spark
|
|
git checkout --force $GIT_TAG
|
|
|
|
# Substitute in case published version is different than released
|
|
old="^\( \{2,4\}\)<version>${RELEASE_VERSION}<\/version>$"
|
|
new="\1<version>${PUBLISH_VERSION}<\/version>"
|
|
find . -name pom.xml | grep -v dev | xargs -I {} sed -i \
|
|
-e "s/${old}/${new}/" {}
|
|
|
|
# Using Nexus API documented here:
|
|
# https://support.sonatype.com/entries/39720203-Uploading-to-a-Staging-Repository-via-REST-API
|
|
echo "Creating Nexus staging repository"
|
|
repo_request="<promoteRequest><data><description>Apache Spark $GIT_TAG (published as $PUBLISH_VERSION)</description></data></promoteRequest>"
|
|
out=$(curl -X POST -d "$repo_request" -u $ASF_USERNAME:$ASF_PASSWORD \
|
|
-H "Content-Type:application/xml" -v \
|
|
$NEXUS_ROOT/profiles/$NEXUS_PROFILE/start)
|
|
staged_repo_id=$(echo $out | sed -e "s/.*\(orgapachespark-[0-9]\{4\}\).*/\1/")
|
|
echo "Created Nexus staging repository: $staged_repo_id"
|
|
|
|
rm -rf $SPARK_REPO
|
|
|
|
build/mvn -DskipTests -Pyarn -Phive \
|
|
-Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
|
|
clean install
|
|
|
|
./dev/change-version-to-2.11.sh
|
|
|
|
build/mvn -DskipTests -Pyarn -Phive \
|
|
-Dscala-2.11 -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
|
|
clean install
|
|
|
|
./dev/change-version-to-2.10.sh
|
|
|
|
pushd $SPARK_REPO
|
|
|
|
# Remove any extra files generated during install
|
|
find . -type f |grep -v \.jar |grep -v \.pom | xargs rm
|
|
|
|
echo "Creating hash and signature files"
|
|
for file in $(find . -type f)
|
|
do
|
|
echo $GPG_PASSPHRASE | gpg --passphrase-fd 0 --output $file.asc --detach-sig --armour $file;
|
|
if [ $(command -v md5) ]; then
|
|
# Available on OS X; -q to keep only hash
|
|
md5 -q $file > $file.md5
|
|
else
|
|
# Available on Linux; cut to keep only hash
|
|
md5sum $file | cut -f1 -d' ' > $file.md5
|
|
fi
|
|
shasum -a 1 $file | cut -f1 -d' ' > $file.sha1
|
|
done
|
|
|
|
nexus_upload=$NEXUS_ROOT/deployByRepositoryId/$staged_repo_id
|
|
echo "Uplading files to $nexus_upload"
|
|
for file in $(find . -type f)
|
|
do
|
|
# strip leading ./
|
|
file_short=$(echo $file | sed -e "s/\.\///")
|
|
dest_url="$nexus_upload/org/apache/spark/$file_short"
|
|
echo " Uploading $file_short"
|
|
curl -u $ASF_USERNAME:$ASF_PASSWORD --upload-file $file_short $dest_url
|
|
done
|
|
|
|
echo "Closing nexus staging repository"
|
|
repo_request="<promoteRequest><data><stagedRepositoryId>$staged_repo_id</stagedRepositoryId><description>Apache Spark $GIT_TAG (published as $PUBLISH_VERSION)</description></data></promoteRequest>"
|
|
out=$(curl -X POST -d "$repo_request" -u $ASF_USERNAME:$ASF_PASSWORD \
|
|
-H "Content-Type:application/xml" -v \
|
|
$NEXUS_ROOT/profiles/$NEXUS_PROFILE/finish)
|
|
echo "Closed Nexus staging repository: $staged_repo_id"
|
|
|
|
popd
|
|
popd
|
|
rm -rf spark
|
|
fi
|
|
|
|
if [[ ! "$@" =~ --skip-package ]]; then
|
|
# Source and binary tarballs
|
|
echo "Packaging release tarballs"
|
|
git clone https://git-wip-us.apache.org/repos/asf/spark.git
|
|
cd spark
|
|
git checkout --force $GIT_TAG
|
|
release_hash=`git rev-parse HEAD`
|
|
|
|
rm .gitignore
|
|
rm -rf .git
|
|
cd ..
|
|
|
|
cp -r spark spark-$RELEASE_VERSION
|
|
tar cvzf spark-$RELEASE_VERSION.tgz spark-$RELEASE_VERSION
|
|
echo $GPG_PASSPHRASE | gpg --passphrase-fd 0 --armour --output spark-$RELEASE_VERSION.tgz.asc \
|
|
--detach-sig spark-$RELEASE_VERSION.tgz
|
|
echo $GPG_PASSPHRASE | gpg --passphrase-fd 0 --print-md MD5 spark-$RELEASE_VERSION.tgz > \
|
|
spark-$RELEASE_VERSION.tgz.md5
|
|
echo $GPG_PASSPHRASE | gpg --passphrase-fd 0 --print-md SHA512 spark-$RELEASE_VERSION.tgz > \
|
|
spark-$RELEASE_VERSION.tgz.sha
|
|
rm -rf spark-$RELEASE_VERSION
|
|
|
|
# Updated for each binary build
|
|
make_binary_release() {
|
|
NAME=$1
|
|
FLAGS=$2
|
|
ZINC_PORT=$3
|
|
cp -r spark spark-$RELEASE_VERSION-bin-$NAME
|
|
|
|
cd spark-$RELEASE_VERSION-bin-$NAME
|
|
|
|
# TODO There should probably be a flag to make-distribution to allow 2.11 support
|
|
if [[ $FLAGS == *scala-2.11* ]]; then
|
|
./dev/change-version-to-2.11.sh
|
|
fi
|
|
|
|
export ZINC_PORT=$ZINC_PORT
|
|
echo "Creating distribution: $NAME ($FLAGS)"
|
|
./make-distribution.sh --name $NAME --tgz $FLAGS -DzincPort=$ZINC_PORT 2>&1 > \
|
|
../binary-release-$NAME.log
|
|
cd ..
|
|
cp spark-$RELEASE_VERSION-bin-$NAME/spark-$RELEASE_VERSION-bin-$NAME.tgz .
|
|
|
|
echo $GPG_PASSPHRASE | gpg --passphrase-fd 0 --armour \
|
|
--output spark-$RELEASE_VERSION-bin-$NAME.tgz.asc \
|
|
--detach-sig spark-$RELEASE_VERSION-bin-$NAME.tgz
|
|
echo $GPG_PASSPHRASE | gpg --passphrase-fd 0 --print-md \
|
|
MD5 spark-$RELEASE_VERSION-bin-$NAME.tgz > \
|
|
spark-$RELEASE_VERSION-bin-$NAME.tgz.md5
|
|
echo $GPG_PASSPHRASE | gpg --passphrase-fd 0 --print-md \
|
|
SHA512 spark-$RELEASE_VERSION-bin-$NAME.tgz > \
|
|
spark-$RELEASE_VERSION-bin-$NAME.tgz.sha
|
|
}
|
|
|
|
# We increment the Zinc port each time to avoid OOM's and other craziness if multiple builds
|
|
# share the same Zinc server.
|
|
make_binary_release "hadoop1" "-Phadoop-1 -Phive -Phive-thriftserver" "3030" &
|
|
make_binary_release "hadoop1-scala2.11" "-Phadoop-1 -Phive -Dscala-2.11" "3031" &
|
|
make_binary_release "cdh4" "-Phadoop-1 -Phive -Phive-thriftserver -Dhadoop.version=2.0.0-mr1-cdh4.2.0" "3032" &
|
|
make_binary_release "hadoop2.3" "-Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn" "3033" &
|
|
make_binary_release "hadoop2.4" "-Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn" "3034" &
|
|
make_binary_release "mapr3" "-Pmapr3 -Phive -Phive-thriftserver" "3035" &
|
|
make_binary_release "mapr4" "-Pmapr4 -Pyarn -Phive -Phive-thriftserver" "3036" &
|
|
make_binary_release "hadoop2.4-without-hive" "-Phadoop-2.4 -Pyarn" "3037" &
|
|
wait
|
|
rm -rf spark-$RELEASE_VERSION-bin-*/
|
|
|
|
# Copy data
|
|
echo "Copying release tarballs"
|
|
rc_folder=spark-$RELEASE_VERSION-$RC_NAME
|
|
ssh $ASF_USERNAME@people.apache.org \
|
|
mkdir /home/$ASF_USERNAME/public_html/$rc_folder
|
|
scp spark-* \
|
|
$ASF_USERNAME@people.apache.org:/home/$ASF_USERNAME/public_html/$rc_folder/
|
|
|
|
# Docs
|
|
cd spark
|
|
sbt/sbt clean
|
|
cd docs
|
|
# Compile docs with Java 7 to use nicer format
|
|
JAVA_HOME="$JAVA_7_HOME" PRODUCTION=1 RELEASE_VERSION="$RELEASE_VERSION" jekyll build
|
|
echo "Copying release documentation"
|
|
rc_docs_folder=${rc_folder}-docs
|
|
ssh $ASF_USERNAME@people.apache.org \
|
|
mkdir /home/$ASF_USERNAME/public_html/$rc_docs_folder
|
|
rsync -r _site/* $ASF_USERNAME@people.apache.org:/home/$ASF_USERNAME/public_html/$rc_docs_folder
|
|
|
|
echo "Release $RELEASE_VERSION completed:"
|
|
echo "Git tag:\t $GIT_TAG"
|
|
echo "Release commit:\t $release_hash"
|
|
echo "Binary location:\t http://people.apache.org/~$ASF_USERNAME/$rc_folder"
|
|
echo "Doc location:\t http://people.apache.org/~$ASF_USERNAME/$rc_docs_folder"
|
|
fi
|