008a2ad1f8
### What changes were proposed in this pull request? As of today, - SPARK-30034 Apache Spark 3.0.0 switched its default Hive execution engine from Hive 1.2 to Hive 2.3. This removes the direct dependency to the forked Hive 1.2.1 in maven repository. - SPARK-32981 Apache Spark 3.1.0(`master` branch) removed Hive 1.2 related artifacts from Apache Spark binary distributions. This PR(SPARK-20202) aims to remove the following usage of unofficial Apache Hive fork completely from Apache Spark master for Apache Spark 3.1.0. ``` <hive.group>org.spark-project.hive</hive.group> <hive.version>1.2.1.spark2</hive.version> ``` For the forked Hive 1.2.1.spark2 users, Apache Spark 2.4(LTS) and 3.0 (~ 2021.12) will provide it. ### Why are the changes needed? - First, Apache Spark community should not use the unofficial forked release of another Apache project. - Second, Apache Hive 1.2.1 was released at 2015-06-26 and the forked Hive `1.2.1.spark2` exposed many unfixable bugs in Apache because the forked `1.2.1.spark2` is not maintained at all. Apache Hive 2.3.0 was released at 2017-07-19 and it has been used with less number of bugs compared with `1.2.1.spark2`. Many bugs still exist in `hive-1.2` profile and new Apache Spark unit tests are added with `HiveUtils.isHive23` condition so far. ### Does this PR introduce _any_ user-facing change? No. This is a dev-only change. PRBuilder will not accept `[test-hive1.2]` on master and `branch-3.1`. ### How was this patch tested? 1. SBT/Hadoop 3.2/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129366) 2. SBT/Hadoop 2.7/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129382) 3. SBT/Hadoop 3.2/Hive 1.2 (This has not been supported already due to Hive 1.2 doesn't work with Hadoop 3.2.) 4. SBT/Hadoop 2.7/Hive 1.2 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129383, This is rejected) Closes #29936 from dongjoon-hyun/SPARK-REMOVE-HIVE1. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
133 lines
5 KiB
Bash
Executable file
133 lines
5 KiB
Bash
Executable file
#!/usr/bin/env bash
|
|
|
|
#
|
|
# Licensed to the Apache Software Foundation (ASF) under one or more
|
|
# contributor license agreements. See the NOTICE file distributed with
|
|
# this work for additional information regarding copyright ownership.
|
|
# The ASF licenses this file to You under the Apache License, Version 2.0
|
|
# (the "License"); you may not use this file except in compliance with
|
|
# the License. You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
#
|
|
|
|
set -ex
|
|
|
|
FWDIR="$(cd "`dirname $0`"/..; pwd)"
|
|
cd "$FWDIR"
|
|
|
|
# Explicitly set locale in order to make `sort` output consistent across machines.
|
|
# See https://stackoverflow.com/questions/28881 for more details.
|
|
export LC_ALL=C
|
|
|
|
# TODO: This would be much nicer to do in SBT, once SBT supports Maven-style resolution.
|
|
|
|
# NOTE: These should match those in the release publishing script
|
|
HADOOP_MODULE_PROFILES="-Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive"
|
|
MVN="build/mvn"
|
|
HADOOP_HIVE_PROFILES=(
|
|
hadoop-2.7-hive-2.3
|
|
hadoop-3.2-hive-2.3
|
|
)
|
|
|
|
# We'll switch the version to a temp. one, publish POMs using that new version, then switch back to
|
|
# the old version. We need to do this because the `dependency:build-classpath` task needs to
|
|
# resolve Spark's internal submodule dependencies.
|
|
|
|
# From http://stackoverflow.com/a/26514030
|
|
set +e
|
|
OLD_VERSION=$($MVN -q \
|
|
-Dexec.executable="echo" \
|
|
-Dexec.args='${project.version}' \
|
|
--non-recursive \
|
|
org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E '[0-9]+\.[0-9]+\.[0-9]+')
|
|
if [ $? != 0 ]; then
|
|
echo -e "Error while getting version string from Maven:\n$OLD_VERSION"
|
|
exit 1
|
|
fi
|
|
set -e
|
|
TEMP_VERSION="spark-$(python -S -c "import random; print(random.randrange(100000, 999999))")"
|
|
|
|
function reset_version {
|
|
# Delete the temporary POMs that we wrote to the local Maven repo:
|
|
find "$HOME/.m2/" | grep "$TEMP_VERSION" | xargs rm -rf
|
|
|
|
# Restore the original version number:
|
|
$MVN -q versions:set -DnewVersion=$OLD_VERSION -DgenerateBackupPoms=false > /dev/null
|
|
}
|
|
trap reset_version EXIT
|
|
|
|
$MVN -q versions:set -DnewVersion=$TEMP_VERSION -DgenerateBackupPoms=false > /dev/null
|
|
|
|
# Generate manifests for each Hadoop profile:
|
|
for HADOOP_HIVE_PROFILE in "${HADOOP_HIVE_PROFILES[@]}"; do
|
|
if [[ $HADOOP_HIVE_PROFILE == **hadoop-3.2-hive-2.3** ]]; then
|
|
HADOOP_PROFILE=hadoop-3.2
|
|
HIVE_PROFILE=hive-2.3
|
|
else
|
|
HADOOP_PROFILE=hadoop-2.7
|
|
HIVE_PROFILE=hive-2.3
|
|
fi
|
|
echo "Performing Maven install for $HADOOP_HIVE_PROFILE"
|
|
$MVN $HADOOP_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE jar:jar jar:test-jar install:install clean -q
|
|
|
|
echo "Performing Maven validate for $HADOOP_HIVE_PROFILE"
|
|
$MVN $HADOOP_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE validate -q
|
|
|
|
echo "Generating dependency manifest for $HADOOP_HIVE_PROFILE"
|
|
mkdir -p dev/pr-deps
|
|
$MVN $HADOOP_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE dependency:build-classpath -pl assembly -am \
|
|
| grep "Dependencies classpath:" -A 1 \
|
|
| tail -n 1 | tr ":" "\n" | awk -F '/' '{
|
|
# For each dependency classpath, we fetch the last three parts split by "/": artifact id, version, and jar name.
|
|
# Since classifier, if exists, always sits between "artifact_id-version-" and ".jar" suffix in the jar name,
|
|
# we extract classifier and put it right before the jar name explicitly.
|
|
# For example, `orc-core/1.5.5/nohive/orc-core-1.5.5-nohive.jar`
|
|
# ^^^^^^
|
|
# extracted classifier
|
|
# `okio/1.15.0//okio-1.15.0.jar`
|
|
# ^
|
|
# empty for dependencies without classifier
|
|
artifact_id=$(NF-2);
|
|
version=$(NF-1);
|
|
jar_name=$NF;
|
|
classifier_start_index=length(artifact_id"-"version"-") + 1;
|
|
classifier_end_index=index(jar_name, ".jar") - 1;
|
|
classifier=substr(jar_name, classifier_start_index, classifier_end_index - classifier_start_index + 1);
|
|
print artifact_id"/"version"/"classifier"/"jar_name
|
|
}' | sort | grep -v spark > dev/pr-deps/spark-deps-$HADOOP_HIVE_PROFILE
|
|
done
|
|
|
|
if [[ $@ == **replace-manifest** ]]; then
|
|
echo "Replacing manifests and creating new files at dev/deps"
|
|
rm -rf dev/deps
|
|
mv dev/pr-deps dev/deps
|
|
exit 0
|
|
fi
|
|
|
|
for HADOOP_HIVE_PROFILE in "${HADOOP_HIVE_PROFILES[@]}"; do
|
|
set +e
|
|
dep_diff="$(
|
|
git diff \
|
|
--no-index \
|
|
dev/deps/spark-deps-$HADOOP_HIVE_PROFILE \
|
|
dev/pr-deps/spark-deps-$HADOOP_HIVE_PROFILE \
|
|
)"
|
|
set -e
|
|
if [ "$dep_diff" != "" ]; then
|
|
echo "Spark's published dependencies DO NOT MATCH the manifest file (dev/spark-deps)."
|
|
echo "To update the manifest file, run './dev/test-dependencies.sh --replace-manifest'."
|
|
echo "$dep_diff"
|
|
rm -rf dev/pr-deps
|
|
exit 1
|
|
fi
|
|
done
|
|
|
|
exit 0
|