[SPARK-34202][SQL][TEST] Add ability to fetch spark release package from internal environment in HiveExternalCatalogVersionsSuite

### What changes were proposed in this pull request?
`HiveExternalCatalogVersionsSuite` can't run in orgs internal environment where access to outside internet is not allowed because `HiveExternalCatalogVersionsSuite` will download spark release package from internet.

Similar to SPARK-32998, this pr add 1 environment variables `SPARK_RELEASE_MIRROR` to let user can specify an accessible download address of spark release package and run `HiveExternalCatalogVersionsSuite`  in orgs internal environment.

### Why are the changes needed?
Let `HiveExternalCatalogVersionsSuite` can run in orgs internal environment without relying on external spark release download address.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass the Jenkins or GitHub Action

- Manual test  with and without env variables set in internal environment can't access internet.

execute
```
mvn clean install -Dhadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -PhPhive -pl  sql/hive -am -DskipTests

mvn clean install -Dhadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -PhPhive -pl  sql/hive -DwildcardSuites=org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite -Dtest=none
```

**Without env**

```
HiveExternalCatalogVersionsSuite:
19:50:35.123 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: Failed to download Spark 3.0.1 from https://archive.apache.org/dist/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz: Network is unreachable (connect failed)
19:50:35.126 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: Failed to download Spark 3.0.1 from https://dist.apache.org/repos/dist/release/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz: Network is unreachable (connect failed)
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED ***
  Exception encountered when invoking run on a nested suite - Unable to download Spark 3.0.1 (HiveExternalCatalogVersionsSuite.scala:125)
Run completed in 2 seconds, 669 milliseconds.
Total number of tests run: 0
Suites: completed 1, aborted 1
Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
```

**With env**

```
export SPARK_RELEASE_MIRROR=${spark-release.internal.com}/dist/release/
```

```
HiveExternalCatalogVersionsSuite
- backward compatibility
Run completed in 1 minute, 32 seconds.
Total number of tests run: 1
Suites: completed 2, aborted 0
Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #31294 from LuciferYang/SPARK-34202.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This commit is contained in:
yangjie01 2021-01-23 08:02:52 -08:00 committed by Dongjoon Hyun
parent 134a7d7eb9
commit e48a8ad1a2

View file

@ -237,14 +237,15 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
}
object PROCESS_TABLES extends QueryTest with SQLTestUtils {
val releaseMirror = "https://dist.apache.org/repos/dist/release"
val releaseMirror = sys.env.getOrElse("SPARK_RELEASE_MIRROR",
"https://dist.apache.org/repos/dist/release")
// Tests the latest version of every release line.
val testingVersions: Seq[String] = {
import scala.io.Source
val versions: Seq[String] = try {
Source.fromURL(s"${releaseMirror}/spark").mkString
.split("\n")
.filter(_.contains("""<li><a href="spark-"""))
.filter(_.contains("""<a href="spark-"""))
.filterNot(_.contains("preview"))
.map("""<a href="spark-(\d.\d.\d)/">""".r.findFirstMatchIn(_).get.group(1))
.filter(_ < org.apache.spark.SPARK_VERSION)