[SPARK-36573][BUILD][TEST] Add a default value to ORACLE_DOCKER_IMAGE

### What changes were proposed in this pull request?
Currently, the procedure to run the Oracle Integration Suite is based on building the Oracle RDBMS image from the Dockerfiles provided by Oracle.
Recently, Oracle has started providing database images, see  https://container-registry.oracle.com
Moreover an Oracle employee is maintaining Oracle XE images that are streamlined for testing at https://hub.docker.com/r/gvenzl/oracle-xe and https://github.com/gvenzl/oci-oracle-xe This solves the issue that official images are quite large and make testing resource-intensive and slow.
This proposes to document the available options and to introduce a default value for ORACLE_DOCKER_IMAGE

### Why are the changes needed?
This change will make it easier and faster to run the Oracle Integration Suite, removing the need to manually build an Oracle DB image.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually tested:
```
export ENABLE_DOCKER_INTEGRATION_TESTS=1
./build/sbt -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.OracleIntegrationSuite"
./build/sbt -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite"
```

Closes #33821 from LucaCanali/oracleDockerIntegration.

Authored-by: Luca Canali <luca.canali@cern.ch>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This commit is contained in:
Luca Canali 2021-08-24 13:30:21 -07:00 committed by Dongjoon Hyun
parent 5b4c216478
commit e03afc906f
3 changed files with 45 additions and 55 deletions

View file

@ -686,7 +686,7 @@ jobs:
HIVE_PROFILE: hive2.3 HIVE_PROFILE: hive2.3
GITHUB_PREV_SHA: ${{ github.event.before }} GITHUB_PREV_SHA: ${{ github.event.before }}
SPARK_LOCAL_IP: localhost SPARK_LOCAL_IP: localhost
ORACLE_DOCKER_IMAGE_NAME: oracle/database:18.4.0-xe ORACLE_DOCKER_IMAGE_NAME: gvenzl/oracle-xe:18.4.0
SKIP_MIMA: true SKIP_MIMA: true
steps: steps:
- name: Checkout Spark repository - name: Checkout Spark repository
@ -724,24 +724,6 @@ jobs:
uses: actions/setup-java@v1 uses: actions/setup-java@v1
with: with:
java-version: 8 java-version: 8
- name: Cache Oracle docker-images repository
id: cache-oracle-docker-images
uses: actions/cache@v2
with:
path: ./oracle/docker-images
# key should contains the commit hash of the Oracle docker images to be checkout.
key: oracle-docker-images-3f422c4a35b423dfcdbcc57a84f01db6c82eb6c1
- name: Checkout Oracle docker-images repository
uses: actions/checkout@v2
with:
fetch-depth: 0
repository: oracle/docker-images
ref: 3f422c4a35b423dfcdbcc57a84f01db6c82eb6c1
path: ./oracle/docker-images
- name: Install Oracle Docker image
run: |
cd oracle/docker-images/OracleDatabase/SingleInstance/dockerfiles
./buildContainerImage.sh -v 18.4.0 -x
- name: Run tests - name: Run tests
run: | run: |
./dev/run-tests --parallelism 1 --modules docker-integration-tests --included-tags org.apache.spark.tags.DockerTest ./dev/run-tests --parallelism 1 --modules docker-integration-tests --included-tags org.apache.spark.tags.DockerTest

View file

@ -34,42 +34,46 @@ import org.apache.spark.sql.types._
import org.apache.spark.tags.DockerTest import org.apache.spark.tags.DockerTest
/** /**
* The following would be the steps to test this * The following are the steps to test this:
* 1. Build Oracle database in Docker, please refer below link about how to. *
* https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md * 1. Choose to use a prebuilt image or build Oracle database in a container
* 2. export ORACLE_DOCKER_IMAGE_NAME=$ORACLE_DOCKER_IMAGE_NAME * - The documentation on how to build Oracle RDBMS in a container is at
* Pull oracle $ORACLE_DOCKER_IMAGE_NAME image - docker pull $ORACLE_DOCKER_IMAGE_NAME * https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md
* 3. Start docker - sudo service docker start * - Official Oracle container images can be found at https://container-registry.oracle.com
* 4. Run spark test - ./build/sbt -Pdocker-integration-tests * - A trustable and streamlined Oracle XE database image can be found on Docker Hub at
* https://hub.docker.com/r/gvenzl/oracle-xe see also https://github.com/gvenzl/oci-oracle-xe
* 2. Run: export ORACLE_DOCKER_IMAGE_NAME=image_you_want_to_use_for_testing
* - Example: export ORACLE_DOCKER_IMAGE_NAME=gvenzl/oracle-xe:latest
* 3. Run: export ENABLE_DOCKER_INTEGRATION_TESTS=1
* 4. Start docker: sudo service docker start
* - Optionally, docker pull $ORACLE_DOCKER_IMAGE_NAME
* 5. Run Spark integration tests for Oracle with: ./build/sbt -Pdocker-integration-tests
* "testOnly org.apache.spark.sql.jdbc.OracleIntegrationSuite" * "testOnly org.apache.spark.sql.jdbc.OracleIntegrationSuite"
* *
* An actual sequence of commands to run the test is as follows * A sequence of commands to build the Oracle XE database container image:
*
* $ git clone https://github.com/oracle/docker-images.git * $ git clone https://github.com/oracle/docker-images.git
* // Head SHA: 3f422c4a35b423dfcdbcc57a84f01db6c82eb6c1
* $ cd docker-images/OracleDatabase/SingleInstance/dockerfiles * $ cd docker-images/OracleDatabase/SingleInstance/dockerfiles
* $ ./buildContainerImage.sh -v 18.4.0 -x * $ ./buildContainerImage.sh -v 18.4.0 -x
* $ export ORACLE_DOCKER_IMAGE_NAME=oracle/database:18.4.0-xe * $ export ORACLE_DOCKER_IMAGE_NAME=oracle/database:18.4.0-xe
* $ export ENABLE_DOCKER_INTEGRATION_TESTS=1
* $ cd $SPARK_HOME
* $ ./build/sbt -Pdocker-integration-tests
* "testOnly org.apache.spark.sql.jdbc.OracleIntegrationSuite"
* *
* It has been validated with 18.4.0 Express Edition. * This procedure has been validated with Oracle 18.4.0 Express Edition.
*/ */
@DockerTest @DockerTest
class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSparkSession { class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSparkSession {
import testImplicits._ import testImplicits._
override val db = new DatabaseOnDocker { override val db = new DatabaseOnDocker {
lazy override val imageName = sys.env("ORACLE_DOCKER_IMAGE_NAME") lazy override val imageName =
sys.env.getOrElse("ORACLE_DOCKER_IMAGE_NAME", "gvenzl/oracle-xe:18.4.0")
val oracle_password = "Th1s1sThe0racle#Pass"
override val env = Map( override val env = Map(
"ORACLE_PWD" -> "oracle" "ORACLE_PWD" -> oracle_password, // oracle images uses this
"ORACLE_PASSWORD" -> oracle_password // gvenzl/oracle-xe uses this
) )
override val usesIpc = false override val usesIpc = false
override val jdbcPort: Int = 1521 override val jdbcPort: Int = 1521
override def getJdbcUrl(ip: String, port: Int): String = override def getJdbcUrl(ip: String, port: Int): String =
s"jdbc:oracle:thin:system/oracle@//$ip:$port/xe" s"jdbc:oracle:thin:system/$oracle_password@//$ip:$port/xe"
} }
override val connectionTimeout = timeout(7.minutes) override val connectionTimeout = timeout(7.minutes)

View file

@ -29,41 +29,45 @@ import org.apache.spark.sql.types._
import org.apache.spark.tags.DockerTest import org.apache.spark.tags.DockerTest
/** /**
* The following would be the steps to test this * The following are the steps to test this:
* 1. Build Oracle database in Docker, please refer below link about how to. *
* https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md * 1. Choose to use a prebuilt image or build Oracle database in a container
* 2. export ORACLE_DOCKER_IMAGE_NAME=$ORACLE_DOCKER_IMAGE_NAME * - The documentation on how to build Oracle RDBMS in a container is at
* Pull oracle $ORACLE_DOCKER_IMAGE_NAME image - docker pull $ORACLE_DOCKER_IMAGE_NAME * https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md
* 3. Start docker - sudo service docker start * - Official Oracle container images can be found at https://container-registry.oracle.com
* 4. Run spark test - ./build/sbt -Pdocker-integration-tests * - A trustable and streamlined Oracle XE database image can be found on Docker Hub at
* https://hub.docker.com/r/gvenzl/oracle-xe see also https://github.com/gvenzl/oci-oracle-xe
* 2. Run: export ORACLE_DOCKER_IMAGE_NAME=image_you_want_to_use_for_testing
* - Example: export ORACLE_DOCKER_IMAGE_NAME=gvenzl/oracle-xe:latest
* 3. Run: export ENABLE_DOCKER_INTEGRATION_TESTS=1
* 4. Start docker: sudo service docker start
* - Optionally, docker pull $ORACLE_DOCKER_IMAGE_NAME
* 5. Run Spark integration tests for Oracle with: ./build/sbt -Pdocker-integration-tests
* "testOnly org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite" * "testOnly org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite"
* *
* An actual sequence of commands to run the test is as follows * A sequence of commands to build the Oracle XE database container image:
*
* $ git clone https://github.com/oracle/docker-images.git * $ git clone https://github.com/oracle/docker-images.git
* // Head SHA: 3f422c4a35b423dfcdbcc57a84f01db6c82eb6c1
* $ cd docker-images/OracleDatabase/SingleInstance/dockerfiles * $ cd docker-images/OracleDatabase/SingleInstance/dockerfiles
* $ ./buildContainerImage.sh -v 18.4.0 -x * $ ./buildContainerImage.sh -v 18.4.0 -x
* $ export ORACLE_DOCKER_IMAGE_NAME=oracle/database:18.4.0-xe * $ export ORACLE_DOCKER_IMAGE_NAME=oracle/database:18.4.0-xe
* $ export ENABLE_DOCKER_INTEGRATION_TESTS=1
* $ cd $SPARK_HOME
* $ ./build/sbt -Pdocker-integration-tests
* "testOnly org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite"
* *
* It has been validated with 18.4.0 Express Edition. * This procedure has been validated with Oracle 18.4.0 Express Edition.
*/ */
@DockerTest @DockerTest
class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with V2JDBCTest { class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with V2JDBCTest {
override val catalogName: String = "oracle" override val catalogName: String = "oracle"
override val db = new DatabaseOnDocker { override val db = new DatabaseOnDocker {
lazy override val imageName = sys.env("ORACLE_DOCKER_IMAGE_NAME") lazy override val imageName =
sys.env.getOrElse("ORACLE_DOCKER_IMAGE_NAME", "gvenzl/oracle-xe:18.4.0")
val oracle_password = "Th1s1sThe0racle#Pass"
override val env = Map( override val env = Map(
"ORACLE_PWD" -> "oracle" "ORACLE_PWD" -> oracle_password, // oracle images uses this
"ORACLE_PASSWORD" -> oracle_password // gvenzl/oracle-xe uses this
) )
override val usesIpc = false override val usesIpc = false
override val jdbcPort: Int = 1521 override val jdbcPort: Int = 1521
override def getJdbcUrl(ip: String, port: Int): String = override def getJdbcUrl(ip: String, port: Int): String =
s"jdbc:oracle:thin:system/oracle@//$ip:$port/xe" s"jdbc:oracle:thin:system/$oracle_password@//$ip:$port/xe"
} }
override def sparkConf: SparkConf = super.sparkConf override def sparkConf: SparkConf = super.sparkConf