spark-instrumented-optimizer/dev/test-dependencies.sh
Dongjoon Hyun 008a2ad1f8 [SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive (Hive 1.2.1)
### What changes were proposed in this pull request?

As of today,
- SPARK-30034 Apache Spark 3.0.0 switched its default Hive execution engine from Hive 1.2 to Hive 2.3. This removes the direct dependency to the forked Hive 1.2.1 in maven repository.
- SPARK-32981 Apache Spark 3.1.0(`master` branch) removed Hive 1.2 related artifacts from Apache Spark binary distributions.

This PR(SPARK-20202) aims to remove the following usage of unofficial Apache Hive fork completely from Apache Spark master for Apache Spark 3.1.0.
```
<hive.group>org.spark-project.hive</hive.group>
<hive.version>1.2.1.spark2</hive.version>
```

For the forked Hive 1.2.1.spark2 users, Apache Spark 2.4(LTS) and 3.0 (~ 2021.12) will provide it.

### Why are the changes needed?

- First, Apache Spark community should not use the unofficial forked release of another Apache project.
- Second, Apache Hive 1.2.1 was released at 2015-06-26 and the forked Hive `1.2.1.spark2` exposed many unfixable bugs in Apache because the forked `1.2.1.spark2` is not maintained at all. Apache Hive 2.3.0 was released at 2017-07-19 and it has been used with less number of bugs compared with `1.2.1.spark2`. Many bugs still exist in `hive-1.2` profile and new Apache Spark unit tests are added with `HiveUtils.isHive23` condition so far.

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change. PRBuilder will not accept `[test-hive1.2]` on master and `branch-3.1`.

### How was this patch tested?

1. SBT/Hadoop 3.2/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129366)
2. SBT/Hadoop 2.7/Hive 2.3 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129382)
3. SBT/Hadoop 3.2/Hive 1.2 (This has not been supported already due to Hive 1.2 doesn't work with Hadoop 3.2.)
4. SBT/Hadoop 2.7/Hive 1.2 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129383, This is rejected)

Closes #29936 from dongjoon-hyun/SPARK-REMOVE-HIVE1.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-10-05 15:29:56 -07:00

133 lines
5 KiB
Bash
Executable file

#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -ex
FWDIR="$(cd "`dirname $0`"/..; pwd)"
cd "$FWDIR"
# Explicitly set locale in order to make `sort` output consistent across machines.
# See https://stackoverflow.com/questions/28881 for more details.
export LC_ALL=C
# TODO: This would be much nicer to do in SBT, once SBT supports Maven-style resolution.
# NOTE: These should match those in the release publishing script
HADOOP_MODULE_PROFILES="-Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive"
MVN="build/mvn"
HADOOP_HIVE_PROFILES=(
hadoop-2.7-hive-2.3
hadoop-3.2-hive-2.3
)
# We'll switch the version to a temp. one, publish POMs using that new version, then switch back to
# the old version. We need to do this because the `dependency:build-classpath` task needs to
# resolve Spark's internal submodule dependencies.
# From http://stackoverflow.com/a/26514030
set +e
OLD_VERSION=$($MVN -q \
-Dexec.executable="echo" \
-Dexec.args='${project.version}' \
--non-recursive \
org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E '[0-9]+\.[0-9]+\.[0-9]+')
if [ $? != 0 ]; then
echo -e "Error while getting version string from Maven:\n$OLD_VERSION"
exit 1
fi
set -e
TEMP_VERSION="spark-$(python -S -c "import random; print(random.randrange(100000, 999999))")"
function reset_version {
# Delete the temporary POMs that we wrote to the local Maven repo:
find "$HOME/.m2/" | grep "$TEMP_VERSION" | xargs rm -rf
# Restore the original version number:
$MVN -q versions:set -DnewVersion=$OLD_VERSION -DgenerateBackupPoms=false > /dev/null
}
trap reset_version EXIT
$MVN -q versions:set -DnewVersion=$TEMP_VERSION -DgenerateBackupPoms=false > /dev/null
# Generate manifests for each Hadoop profile:
for HADOOP_HIVE_PROFILE in "${HADOOP_HIVE_PROFILES[@]}"; do
if [[ $HADOOP_HIVE_PROFILE == **hadoop-3.2-hive-2.3** ]]; then
HADOOP_PROFILE=hadoop-3.2
HIVE_PROFILE=hive-2.3
else
HADOOP_PROFILE=hadoop-2.7
HIVE_PROFILE=hive-2.3
fi
echo "Performing Maven install for $HADOOP_HIVE_PROFILE"
$MVN $HADOOP_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE jar:jar jar:test-jar install:install clean -q
echo "Performing Maven validate for $HADOOP_HIVE_PROFILE"
$MVN $HADOOP_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE validate -q
echo "Generating dependency manifest for $HADOOP_HIVE_PROFILE"
mkdir -p dev/pr-deps
$MVN $HADOOP_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE dependency:build-classpath -pl assembly -am \
| grep "Dependencies classpath:" -A 1 \
| tail -n 1 | tr ":" "\n" | awk -F '/' '{
# For each dependency classpath, we fetch the last three parts split by "/": artifact id, version, and jar name.
# Since classifier, if exists, always sits between "artifact_id-version-" and ".jar" suffix in the jar name,
# we extract classifier and put it right before the jar name explicitly.
# For example, `orc-core/1.5.5/nohive/orc-core-1.5.5-nohive.jar`
# ^^^^^^
# extracted classifier
# `okio/1.15.0//okio-1.15.0.jar`
# ^
# empty for dependencies without classifier
artifact_id=$(NF-2);
version=$(NF-1);
jar_name=$NF;
classifier_start_index=length(artifact_id"-"version"-") + 1;
classifier_end_index=index(jar_name, ".jar") - 1;
classifier=substr(jar_name, classifier_start_index, classifier_end_index - classifier_start_index + 1);
print artifact_id"/"version"/"classifier"/"jar_name
}' | sort | grep -v spark > dev/pr-deps/spark-deps-$HADOOP_HIVE_PROFILE
done
if [[ $@ == **replace-manifest** ]]; then
echo "Replacing manifests and creating new files at dev/deps"
rm -rf dev/deps
mv dev/pr-deps dev/deps
exit 0
fi
for HADOOP_HIVE_PROFILE in "${HADOOP_HIVE_PROFILES[@]}"; do
set +e
dep_diff="$(
git diff \
--no-index \
dev/deps/spark-deps-$HADOOP_HIVE_PROFILE \
dev/pr-deps/spark-deps-$HADOOP_HIVE_PROFILE \
)"
set -e
if [ "$dep_diff" != "" ]; then
echo "Spark's published dependencies DO NOT MATCH the manifest file (dev/spark-deps)."
echo "To update the manifest file, run './dev/test-dependencies.sh --replace-manifest'."
echo "$dep_diff"
rm -rf dev/pr-deps
exit 1
fi
done
exit 0