[SPARK-36573][BUILD][TEST] Add a default value to ORACLE_DOCKER_IMAGE

### What changes were proposed in this pull request?
Currently, the procedure to run the Oracle Integration Suite is based on building the Oracle RDBMS image from the Dockerfiles provided by Oracle.
Recently, Oracle has started providing database images, see  https://container-registry.oracle.com
Moreover an Oracle employee is maintaining Oracle XE images that are streamlined for testing at https://hub.docker.com/r/gvenzl/oracle-xe and https://github.com/gvenzl/oci-oracle-xe This solves the issue that official images are quite large and make testing resource-intensive and slow.
This proposes to document the available options and to introduce a default value for ORACLE_DOCKER_IMAGE

### Why are the changes needed?
This change will make it easier and faster to run the Oracle Integration Suite, removing the need to manually build an Oracle DB image.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually tested:
```
export ENABLE_DOCKER_INTEGRATION_TESTS=1
./build/sbt -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.OracleIntegrationSuite"
./build/sbt -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite"
```

Closes #33821 from LucaCanali/oracleDockerIntegration.

Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This commit is contained in:
Luca Canali 2021-08-24 13:30:21 -07:00 committed by Dongjoon Hyun
parent 5b4c216478
commit e03afc906f
3 changed files with 45 additions and 55 deletions

View file

@ -686,7 +686,7 @@ jobs:
HIVE_PROFILE: hive2.3
GITHUB_PREV_SHA: ${{ github.event.before }}
SPARK_LOCAL_IP: localhost
ORACLE_DOCKER_IMAGE_NAME: oracle/database:18.4.0-xe
ORACLE_DOCKER_IMAGE_NAME: gvenzl/oracle-xe:18.4.0
SKIP_MIMA: true
steps:
- name: Checkout Spark repository
@ -724,24 +724,6 @@ jobs:
uses: actions/setup-java@v1
with:
java-version: 8
- name: Cache Oracle docker-images repository
id: cache-oracle-docker-images
uses: actions/cache@v2
with:
path: ./oracle/docker-images
# key should contains the commit hash of the Oracle docker images to be checkout.
key: oracle-docker-images-3f422c4a35b423dfcdbcc57a84f01db6c82eb6c1
- name: Checkout Oracle docker-images repository
uses: actions/checkout@v2
with:
fetch-depth: 0
repository: oracle/docker-images
ref: 3f422c4a35b423dfcdbcc57a84f01db6c82eb6c1
path: ./oracle/docker-images
- name: Install Oracle Docker image
run: |
cd oracle/docker-images/OracleDatabase/SingleInstance/dockerfiles
./buildContainerImage.sh -v 18.4.0 -x
- name: Run tests
run: |
./dev/run-tests --parallelism 1 --modules docker-integration-tests --included-tags org.apache.spark.tags.DockerTest

View file

@ -34,42 +34,46 @@ import org.apache.spark.sql.types._
import org.apache.spark.tags.DockerTest
/**
* The following would be the steps to test this
* 1. Build Oracle database in Docker, please refer below link about how to.
* https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md
* 2. export ORACLE_DOCKER_IMAGE_NAME=$ORACLE_DOCKER_IMAGE_NAME
* Pull oracle $ORACLE_DOCKER_IMAGE_NAME image - docker pull $ORACLE_DOCKER_IMAGE_NAME
* 3. Start docker - sudo service docker start
* 4. Run spark test - ./build/sbt -Pdocker-integration-tests
* The following are the steps to test this:
*
* 1. Choose to use a prebuilt image or build Oracle database in a container
* - The documentation on how to build Oracle RDBMS in a container is at
* https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md
* - Official Oracle container images can be found at https://container-registry.oracle.com
* - A trustable and streamlined Oracle XE database image can be found on Docker Hub at
* https://hub.docker.com/r/gvenzl/oracle-xe see also https://github.com/gvenzl/oci-oracle-xe
* 2. Run: export ORACLE_DOCKER_IMAGE_NAME=image_you_want_to_use_for_testing
* - Example: export ORACLE_DOCKER_IMAGE_NAME=gvenzl/oracle-xe:latest
* 3. Run: export ENABLE_DOCKER_INTEGRATION_TESTS=1
* 4. Start docker: sudo service docker start
* - Optionally, docker pull $ORACLE_DOCKER_IMAGE_NAME
* 5. Run Spark integration tests for Oracle with: ./build/sbt -Pdocker-integration-tests
* "testOnly org.apache.spark.sql.jdbc.OracleIntegrationSuite"
*
* An actual sequence of commands to run the test is as follows
*
* A sequence of commands to build the Oracle XE database container image:
* $ git clone https://github.com/oracle/docker-images.git
* // Head SHA: 3f422c4a35b423dfcdbcc57a84f01db6c82eb6c1
* $ cd docker-images/OracleDatabase/SingleInstance/dockerfiles
* $ ./buildContainerImage.sh -v 18.4.0 -x
* $ export ORACLE_DOCKER_IMAGE_NAME=oracle/database:18.4.0-xe
* $ export ENABLE_DOCKER_INTEGRATION_TESTS=1
* $ cd $SPARK_HOME
* $ ./build/sbt -Pdocker-integration-tests
* "testOnly org.apache.spark.sql.jdbc.OracleIntegrationSuite"
*
* It has been validated with 18.4.0 Express Edition.
* This procedure has been validated with Oracle 18.4.0 Express Edition.
*/
@DockerTest
class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSparkSession {
import testImplicits._
override val db = new DatabaseOnDocker {
lazy override val imageName = sys.env("ORACLE_DOCKER_IMAGE_NAME")
lazy override val imageName =
sys.env.getOrElse("ORACLE_DOCKER_IMAGE_NAME", "gvenzl/oracle-xe:18.4.0")
val oracle_password = "Th1s1sThe0racle#Pass"
override val env = Map(
"ORACLE_PWD" -> "oracle"
"ORACLE_PWD" -> oracle_password, // oracle images uses this
"ORACLE_PASSWORD" -> oracle_password // gvenzl/oracle-xe uses this
)
override val usesIpc = false
override val jdbcPort: Int = 1521
override def getJdbcUrl(ip: String, port: Int): String =
s"jdbc:oracle:thin:system/oracle@//$ip:$port/xe"
s"jdbc:oracle:thin:system/$oracle_password@//$ip:$port/xe"
}
override val connectionTimeout = timeout(7.minutes)

View file

@ -29,41 +29,45 @@ import org.apache.spark.sql.types._
import org.apache.spark.tags.DockerTest
/**
* The following would be the steps to test this
* 1. Build Oracle database in Docker, please refer below link about how to.
* https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md
* 2. export ORACLE_DOCKER_IMAGE_NAME=$ORACLE_DOCKER_IMAGE_NAME
* Pull oracle $ORACLE_DOCKER_IMAGE_NAME image - docker pull $ORACLE_DOCKER_IMAGE_NAME
* 3. Start docker - sudo service docker start
* 4. Run spark test - ./build/sbt -Pdocker-integration-tests
* The following are the steps to test this:
*
* 1. Choose to use a prebuilt image or build Oracle database in a container
* - The documentation on how to build Oracle RDBMS in a container is at
* https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md
* - Official Oracle container images can be found at https://container-registry.oracle.com
* - A trustable and streamlined Oracle XE database image can be found on Docker Hub at
* https://hub.docker.com/r/gvenzl/oracle-xe see also https://github.com/gvenzl/oci-oracle-xe
* 2. Run: export ORACLE_DOCKER_IMAGE_NAME=image_you_want_to_use_for_testing
* - Example: export ORACLE_DOCKER_IMAGE_NAME=gvenzl/oracle-xe:latest
* 3. Run: export ENABLE_DOCKER_INTEGRATION_TESTS=1
* 4. Start docker: sudo service docker start
* - Optionally, docker pull $ORACLE_DOCKER_IMAGE_NAME
* 5. Run Spark integration tests for Oracle with: ./build/sbt -Pdocker-integration-tests
* "testOnly org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite"
*
* An actual sequence of commands to run the test is as follows
*
* A sequence of commands to build the Oracle XE database container image:
* $ git clone https://github.com/oracle/docker-images.git
* // Head SHA: 3f422c4a35b423dfcdbcc57a84f01db6c82eb6c1
* $ cd docker-images/OracleDatabase/SingleInstance/dockerfiles
* $ ./buildContainerImage.sh -v 18.4.0 -x
* $ export ORACLE_DOCKER_IMAGE_NAME=oracle/database:18.4.0-xe
* $ export ENABLE_DOCKER_INTEGRATION_TESTS=1
* $ cd $SPARK_HOME
* $ ./build/sbt -Pdocker-integration-tests
* "testOnly org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite"
*
* It has been validated with 18.4.0 Express Edition.
* This procedure has been validated with Oracle 18.4.0 Express Edition.
*/
@DockerTest
class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with V2JDBCTest {
override val catalogName: String = "oracle"
override val db = new DatabaseOnDocker {
lazy override val imageName = sys.env("ORACLE_DOCKER_IMAGE_NAME")
lazy override val imageName =
sys.env.getOrElse("ORACLE_DOCKER_IMAGE_NAME", "gvenzl/oracle-xe:18.4.0")
val oracle_password = "Th1s1sThe0racle#Pass"
override val env = Map(
"ORACLE_PWD" -> "oracle"
"ORACLE_PWD" -> oracle_password, // oracle images uses this
"ORACLE_PASSWORD" -> oracle_password // gvenzl/oracle-xe uses this
)
override val usesIpc = false
override val jdbcPort: Int = 1521
override def getJdbcUrl(ip: String, port: Int): String =
s"jdbc:oracle:thin:system/oracle@//$ip:$port/xe"
s"jdbc:oracle:thin:system/$oracle_password@//$ip:$port/xe"
}
override def sparkConf: SparkConf = super.sparkConf