spark-instrumented-optimizer/sql/core
Dongjoon Hyun 1466863cee [SPARK-30015][BUILD] Move hive-storage-api dependency from hive-2.3 to sql/core
# What changes were proposed in this pull request?

This PR aims to relocate the following internal dependencies to compile `sql/core` without `-Phive-2.3` profile.

1. Move the `hive-storage-api` to `sql/core` which is using `hive-storage-api` really.

**BEFORE (sql/core compilation)**
```
$ ./build/mvn -DskipTests --pl sql/core --am compile
...
[ERROR] [Error] /Users/dongjoon/APACHE/spark/sql/core/v2.3/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala:21: object hive is not a member of package org.apache.hadoop
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
```
**AFTER (sql/core compilation)**
```
$ ./build/mvn -DskipTests --pl sql/core --am compile
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  02:04 min
[INFO] Finished at: 2019-11-25T00:20:11-08:00
[INFO] ------------------------------------------------------------------------
```

2. For (1), add `commons-lang:commons-lang` test dependency to `spark-core` module to manage the dependency explicitly. Without this, `core` module fails to build the test classes.

```
$ ./build/mvn -DskipTests --pl core --am package -Phadoop-3.2
...
[INFO] --- scala-maven-plugin:4.3.0:testCompile (scala-test-compile-first)  spark-core_2.12 ---
[INFO] Using incremental compilation using Mixed compile order
[INFO] Compiler bridge file: /Users/dongjoon/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.10__52.0-1.3.1_20191012T045515.jar
[INFO] Compiling 271 Scala sources and 26 Java sources to /spark/core/target/scala-2.12/test-classes ...
[ERROR] [Error] /spark/core/src/test/scala/org/apache/spark/util/PropertiesCloneBenchmark.scala:23: object lang is not a member of package org.apache.commons
[ERROR] [Error] /spark/core/src/test/scala/org/apache/spark/util/PropertiesCloneBenchmark.scala:49: not found: value SerializationUtils
[ERROR] two errors found
```

**BEFORE (commons-lang:commons-lang)**
The following is the previous `core` module's `commons-lang:commons-lang` dependency.

1. **branch-2.4**
```
$ mvn dependency:tree -Dincludes=commons-lang:commons-lang
[INFO] --- maven-dependency-plugin:3.0.2:tree (default-cli)  spark-core_2.11 ---
[INFO] org.apache.spark:spark-core_2.11🫙2.4.5-SNAPSHOT
[INFO] \- org.spark-project.hive:hive-exec:jar:1.2.1.spark2:provided
[INFO]    \- commons-lang:commons-lang:jar:2.6:compile
```

2. **v3.0.0-preview (-Phadoop-3.2)**
```
$ mvn dependency:tree -Dincludes=commons-lang:commons-lang -Phadoop-3.2
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli)  spark-core_2.12 ---
[INFO] org.apache.spark:spark-core_2.12🫙3.0.0-preview
[INFO] \- org.apache.hive:hive-storage-api:jar:2.6.0:compile
[INFO]    \- commons-lang:commons-lang:jar:2.6:compile
```

3. **v3.0.0-preview(default)**
```
$ mvn dependency:tree -Dincludes=commons-lang:commons-lang
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli)  spark-core_2.12 ---
[INFO] org.apache.spark:spark-core_2.12🫙3.0.0-preview
[INFO] \- org.apache.hadoop:hadoop-client:jar:2.7.4:compile
[INFO]    \- org.apache.hadoop:hadoop-common:jar:2.7.4:compile
[INFO]       \- commons-lang:commons-lang:jar:2.6:compile
```

**AFTER (commons-lang:commons-lang)**
```
$ mvn dependency:tree -Dincludes=commons-lang:commons-lang
[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli)  spark-core_2.12 ---
[INFO] org.apache.spark:spark-core_2.12🫙3.0.0-SNAPSHOT
[INFO] \- commons-lang:commons-lang:jar:2.6:test
```

Since we wanted to verify that this PR doesn't change `hive-1.2` profile, we merged
[SPARK-30005 Update `test-dependencies.sh` to check `hive-1.2/2.3` profile](a1706e2fa7) before this PR.

### Why are the changes needed?

- Apache Spark 2.4's `sql/core` is using `Apache ORC (nohive)` jars including shaded `hive-storage-api` to access ORC data sources.

- Apache Spark 3.0's `sql/core` is using `Apache Hive` jars directly. Previously, `-Phadoop-3.2` hid this `hive-storage-api` dependency. Now, we are using `-Phive-2.3` instead. As I mentioned [previously](https://github.com/apache/spark/pull/26619#issuecomment-556926064), this PR is required to compile `sql/core` module without `-Phive-2.3`.

- For `sql/hive` and `sql/hive-thriftserver`, it's natural that we need `-Phive-1.2` or `-Phive-2.3`.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This will pass the Jenkins (with the dependency check and unit tests).

We need to check manually with `./build/mvn -DskipTests --pl sql/core --am compile`.

This closes #26657 .

Closes #26658 from dongjoon-hyun/SPARK-30015.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-25 10:54:14 -08:00
..
benchmarks [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values 2019-11-11 21:53:33 +08:00
src [SPARK-29979][SQL] Add basic/reserved property key constants in TableCatalog and SupportsNamespaces 2019-11-26 01:24:43 +08:00
v1.2/src [SPARK-29981][BUILD][FOLLOWUP] Change hive.version.short 2019-11-23 12:50:50 -08:00
v2.3/src [SPARK-29981][BUILD][FOLLOWUP] Change hive.version.short 2019-11-23 12:50:50 -08:00
pom.xml [SPARK-30015][BUILD] Move hive-storage-api dependency from hive-2.3 to sql/core 2019-11-25 10:54:14 -08:00