Commit graph

130 commits

Author SHA1 Message Date
Gengliang Wang 0fb7127f85 Preparing development version 3.2.1-SNAPSHOT 2021-09-23 08:46:28 +00:00
Gengliang Wang b609f2fe0c Preparing Spark release v3.2.0-rc4 2021-09-23 08:46:22 +00:00
Gengliang Wang b0249851f6 Preparing development version 3.2.1-SNAPSHOT 2021-09-18 11:30:12 +00:00
Gengliang Wang 96044e9735 Preparing Spark release v3.2.0-rc3 2021-09-18 11:30:06 +00:00
Gengliang Wang 1bad04d028 Preparing development version 3.2.1-SNAPSHOT 2021-08-31 17:04:14 +00:00
Gengliang Wang 03f5d23e96 Preparing Spark release v3.2.0-rc2 2021-08-31 17:04:08 +00:00
Gengliang Wang 69be513c5e Preparing development version 3.2.1-SNAPSHOT 2021-08-20 12:40:47 +00:00
Gengliang Wang 6bb3523d8e Preparing Spark release v3.2.0-rc1 2021-08-20 12:40:40 +00:00
Gengliang Wang fafdc1482b Revert "Preparing Spark release v3.2.0-rc1"
This reverts commit 8e58fafb05.
2021-08-20 20:07:02 +08:00
Gengliang Wang c829ed53ff Revert "Preparing development version 3.2.1-SNAPSHOT"
This reverts commit 4f1d21571d.
2021-08-20 20:07:01 +08:00
Gengliang Wang 4f1d21571d Preparing development version 3.2.1-SNAPSHOT 2021-08-19 14:08:32 +00:00
Gengliang Wang 8e58fafb05 Preparing Spark release v3.2.0-rc1 2021-08-19 14:08:26 +00:00
Wenchen Fan 0df89d8999 [SPARK-34302][SQL][FOLLOWUP] More code cleanup
### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/33113, to do some code cleanup:
1. `UnresolvedFieldPosition` doesn't need to include the field name. We can get it through "context" (`AlterTableAlterColumn.column.name`).
2. Run `ResolveAlterTableCommands` in the main resolution batch, so that the column/field resolution is also unified between v1 and v2 commands (same error message).
3. Fail immediately in `ResolveAlterTableCommands` if we can't resolve the field, instead of waiting until `CheckAnalysis`. We don't expect other rules to resolve fields in ALTER  TABLE commands, so failing immediately is simpler and we can remove duplicated code in `CheckAnalysis`.

### Why are the changes needed?

code simplification.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #33213 from cloud-fan/follow.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 8b46e26fc6)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-07-06 03:43:54 +08:00
Dongjoon Hyun 16e50356ee [SPARK-34302][FOLLOWUP][SQL][TESTS] Update jdbc.v2.*IntegrationSuite
### What changes were proposed in this pull request?

This PR aims to update JDBC v2 integration suite by adding `catalogName`.

### Why are the changes needed?

To recover the integration test suite.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action.

Closes #33124 from dongjoon-hyun/SPARK-34302.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-28 23:01:54 -07:00
Kousuke Saruta c562c1674e [SPARK-34320][SQL][FOLLOWUP] Modify V2JDBCTest to follow the change of the error message
### What changes were proposed in this pull request?

This is a followup PR for SPARK-34320 (#32854).
That PR changed the error message of `ALTER TABLE` but `V2JDBCTest` didn't comply with the change.

### Why are the changes needed?

To fix`v2.*JDBCSuite` failure.
```
[info] - SPARK-33034: ALTER TABLE ... add new columns (173 milliseconds)
[info] - SPARK-33034: ALTER TABLE ... drop column *** FAILED *** (126 milliseconds)
[info]   "Cannot delete missing field bad_column in postgresql.alt_table schema: root
[info]    |-- C2: string (nullable = true)
[info]   ; line 1 pos 0;
[info]   'AlterTableDropColumns [unresolvedfieldname(bad_column)]
[info]   +- ResolvedTable org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog7f4b7516, alt_table, JDBCTable(alt_table,StructType(StructField(C2,StringType,true)),org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions5842301d), [C2#1879]
[info]   " did not contain "Cannot delete missing field bad_column in alt_table schema" (V2JDBCTest.scala:106)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.sql.jdbc.v2.V2JDBCTest.$anonfun$$init$$6(V2JDBCTest.scala:106)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1461)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:305)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:303)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.withTable(DockerJDBCIntegrationSuite.scala:95)
[info]   at org.apache.spark.sql.jdbc.v2.V2JDBCTest.$anonfun$$init$$5(V2JDBCTest.scala:95)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
[info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
[info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
[info]   at org.scalatest.Suite.run(Suite.scala:1112)
[info]   at org.scalatest.Suite.run$(Suite.scala:1094)
[info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:62)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:62)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
[info] - SPARK-33034: ALTER TABLE ... update column type (122 milliseconds)
[info] - SPARK-33034: ALTER TABLE ... rename column (93 milliseconds)
[info] - SPARK-33034: ALTER TABLE ... update column nullability (92 milliseconds)
[info] - CREATE TABLE with table comment (38 milliseconds)
[info] - CREATE TABLE with table property (52 milliseconds)
[info] MySQLIntegrationSuite:
[info] - Basic test (61 milliseconds)
[info] - Numeric types (67 milliseconds)
[info] - Date types (59 milliseconds)
[info] - String types (50 milliseconds)
[info] - Basic write test (216 milliseconds)
[info] - query JDBC option (64 milliseconds)
[info] Run completed in 19 minutes, 43 seconds.
[info] Total number of tests run: 89
[info] Suites: completed 14, aborted 0
[info] Tests: succeeded 84, failed 5, canceled 0, ignored 0, pending 0
[info] *** 5 TESTS FAILED ***
[error] Failed tests:
[error] 	org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite
[error] 	org.apache.spark.sql.jdbc.v2.MsSqlServerIntegrationSuite
[error] 	org.apache.spark.sql.jdbc.v2.DB2IntegrationSuite
[error] 	org.apache.spark.sql.jdbc.v2.MySQLIntegrationSuite
[error] 	org.apache.spark.sql.jdbc.v2.PostgresIntegrationSuite
[error] (docker-integration-tests / Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 1223 s (20:23), completed Jun 25, 2021 1:31:04 AM
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

docker-integration-tests on GA.

Closes #33074 from sarutak/followup-SPARK-34320.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
2021-06-25 12:58:38 +09:00
Kousuke Saruta 08e6f633b5 [SPARK-35577][TESTS] Allow to log container output for docker integration tests
### What changes were proposed in this pull request?

This PR proposes to add a feature that logs container output for docker integration tests.
With this change, if we run test with SBT, we will have like the following log in `unit-tests.log`.
```
===== CONTAINER LOGS FOR container Id: 3360c98eb28337d8b217fb614e47bf49aafa18a6cb60ecadf3178aee0c663021 =====
21/05/31 20:54:56.433 pool-1-thread-1 INFO PostgresIntegrationSuite: The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
sh: locale: not found
2021-05-31 11:54:49.892 UTC [29] WARNING:  no usable system locales were found
performing post-bootstrap initialization ... ok
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/data -l logfile start

waiting for server to start....2021-05-31 11:54:50.284 UTC [34] LOG:  starting PostgreSQL 13.0 on x86_64-pc-linux-musl, compiled by gcc (Alpine 9.3.0) 9.3.0, 64-bit
2021-05-31 11:54:50.287 UTC [34] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-05-31 11:54:50.296 UTC [35] LOG:  database system was shut down at 2021-05-31 11:54:50 UTC
2021-05-31 11:54:50.301 UTC [34] LOG:  database system is ready to accept connections
 done
server started

/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*

waiting for server to shut down....2021-05-31 11:54:50.363 UTC [34] LOG:  received fast shutdown request
2021-05-31 11:54:50.366 UTC [34] LOG:  aborting any active transactions
2021-05-31 11:54:50.368 UTC [34] LOG:  background worker "logical replication launcher" (PID 41) exited with exit code 1
2021-05-31 11:54:50.368 UTC [36] LOG:  shutting down
2021-05-31 11:54:50.402 UTC [34] LOG:  database system is shut down
 done
server stopped

PostgreSQL init process complete; ready for start up.

2021-05-31 11:54:50.510 UTC [1] LOG:  starting PostgreSQL 13.0 on x86_64-pc-linux-musl, compiled by gcc (Alpine 9.3.0) 9.3.0, 64-bit
2021-05-31 11:54:50.510 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-05-31 11:54:50.510 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2021-05-31 11:54:50.517 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-05-31 11:54:50.526 UTC [43] LOG:  database system was shut down at 2021-05-31 11:54:50 UTC
2021-05-31 11:54:50.531 UTC [1] LOG:  database system is ready to accept connections
2021-05-31 11:54:54.226 UTC [54] ERROR:  relation "public.barcopy" does not exist at character 15
2021-05-31 11:54:54.226 UTC [54] STATEMENT:  SELECT 1 FROM public.barcopy LIMIT 1
2021-05-31 11:54:54.610 UTC [59] ERROR:  relation "public.barcopy2" does not exist at character 15
2021-05-31 11:54:54.610 UTC [59] STATEMENT:  SELECT 1 FROM public.barcopy2 LIMIT 1
2021-05-31 11:54:54.934 UTC [63] ERROR:  relation "shortfloat" does not exist at character 15
2021-05-31 11:54:54.934 UTC [63] STATEMENT:  SELECT 1 FROM shortfloat LIMIT 1
2021-05-31 11:54:55.675 UTC [75] ERROR:  relation "byte_to_smallint_test" does not exist at character 15
2021-05-31 11:54:55.675 UTC [75] STATEMENT:  SELECT 1 FROM byte_to_smallint_test LIMIT 1

21/05/31 20:54:56.434 pool-1-thread-1 INFO PostgresIntegrationSuite:

===== END OF CONTAINER LOGS FOR container Id: 3360c98eb28337d8b217fb614e47bf49aafa18a6cb60ecadf3178aee0c663021 =====
```

### Why are the changes needed?

If we have container logs, it's useful to debug especially for GA.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I run docker integration tests and got logs. The example shown above is for `PostgresIntegrationSuite`.

Closes #32715 from sarutak/log-docker-container.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2021-06-01 22:44:48 +09:00
Kousuke Saruta 2de19e460b [SPARK-35483][INFRA] Add docker-integration-tests to run-tests.py and GA
### What changes were proposed in this pull request?

This PR proposes to add `docker-integratin-tests` to `run-tests.py` and GA.
Once #32631 was merged but there was a lack of consideration.

Diff between this change and 692d95d145 merged in #32631 is as follows.

```
       if: github.repository != 'apache/spark'
       id: sync-branch
       run: |
+        apache_spark_ref=`git rev-parse HEAD`
         git fetch https://github.com/$GITHUB_REPOSITORY.git ${GITHUB_REF#refs/heads/}
         git -c user.name='Apache Spark Test Account' -c user.email='sparktestaccgmail.com' merge --no-commit --progress --squash FETCH_HEAD
         git -c user.name='Apache Spark Test Account' -c user.email='sparktestaccgmail.com' commit -m "Merged commit"
+        echo "::set-output name=APACHE_SPARK_REF::$apache_spark_ref"
     - name: Cache Scala, SBT and Maven
       uses: actions/cachev2
       with:
```

### Why are the changes needed?

CI for `docker-integration-tests` is absent for now.

### Does this PR introduce _any_ user-facing change?

GA.

### How was this patch tested?

Closes #32691 from sarutak/docker-integration-test-ga-take2.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-05-28 16:54:47 +09:00
Hyukjin Kwon d189cf75f9 Revert "[SPARK-35483][INFRA] Add docker-integration-tests to run-tests.py and GA"
This reverts commit 0a74ad66b3.
2021-05-28 14:29:12 +09:00
Kousuke Saruta 0a74ad66b3 [SPARK-35483][INFRA] Add docker-integration-tests to run-tests.py and GA
### What changes were proposed in this pull request?

This PR proposes to add `docker-integratin-tests` to `run-tests.py` and GA.
`doker-integration-tests` can't run if docker is not installed so it run only if `docker-integration-tests` is specified with `--module`.

### Why are the changes needed?

CI for `docker-integration-tests` is absent for now.

### Does this PR introduce _any_ user-facing change?

GA.

### How was this patch tested?

Closes #32631 from sarutak/docker-integration-test-ga.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-05-28 07:56:37 +09:00
Kousuke Saruta 116a97e153 [SPARK-35501][SQL][TESTS] Add a feature for removing pulled container image for docker integration tests
### What changes were proposed in this pull request?

This PR adds a feature for removing pulled container image after every docker integration test finish.
This feature is enabled by the new propoerty `spark.tes.docker.removePulledImage`.

### Why are the changes needed?

For idempotent.
I'm trying to add docker integration tests to GA in SPARK-35483 (#32631) but I noticed that `jdbc.OracleIntegrationSuite` consistently fails(https://github.com/sarutak/spark/runs/2646707235?check_suite_focus=true).
I investigated the reason and I found it's short of the storage capacity of the host on GA.
```
 ORACLE PASSWORD FOR SYS AND SYSTEM: oracle
The location '/opt/oracle' specified for database files has insufficient space.
Database creation needs at least '4.5GB' disk space.
Specify a different database file destination that has enough space in the configuration file '/etc/sysconfig/oracle-xe-18c.conf'.
mv: cannot stat '/opt/oracle/product/18c/dbhomeXE/dbs/spfileXE.ora': No such file or directory
mv: cannot stat '/opt/oracle/product/18c/dbhomeXE/dbs/orapwXE': No such file or directory
ORACLE_HOME = [/home/oracle] ? ORACLE_BASE environment variable is not being set since this
information is not available for the current user ID .
You can set ORACLE_BASE manually if it is required.
Resetting ORACLE_BASE to its previous value or ORACLE_HOME
The Oracle base remains unchanged with value /opt/oracle
#####################################
########### E R R O R ###############
DATABASE SETUP WAS NOT SUCCESSFUL!
Please check output for further info!
########### E R R O R ###############
#####################################
The following output is now a tail of the alert.log:
tail: cannot open '/opt/oracle/diag/rdbms/*/*/trace/alert*.log' for reading: No such file or directory
tail: no files remaining
```

With this feature, pulled container image is removed and keep the capacity for `jdbc.OracleIntegrationSuite` in GA.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed the following things.

* A container image which is absent in the local repository is removed after test finished if `spark.test.container.removePulledImage` is `true`.
* A container image which is present in the local repository is not removed after the finished even if `spark.test.container.removePulledImage` is `true`.
* A container image is not removed regardless of presence of the container image in the local repository even if `spark.test.container.removePulledImage` is `true`.

Closes #32652 from sarutak/docker-image-rm.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-05-26 17:24:29 +09:00
Kousuke Saruta 1a43415d8d [SPARK-35226][SQL][FOLLOWUP] Fix test added in SPARK-35226 for DB2KrbIntegrationSuite
### What changes were proposed in this pull request?

This PR fixes an test added in SPARK-35226 (#32344).

### Why are the changes needed?

`SELECT 1` seems non-valid query for DB2.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

DB2KrbIntegrationSuite passes on my laptop.

I also confirmed all the KrbIntegrationSuites pass with the following command.
```
build/sbt -Phive -Phive-thriftserver -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.*KrbIntegrationSuite"
```

Closes #32632 from sarutak/followup-SPARK-35226.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-05-22 22:31:43 -07:00
Kousuke Saruta a59a214610 [MINOR][FOLLOWUP] Update SHA for the oracle docker image
### What changes were proposed in this pull request?

This PR updates SHA for the oracle docker image in the comment in `OracleIntegrationSuite` and `v2.OracleIntegrationSuite`.
The SHA for the latest image is `3f422c4a35b423dfcdbcc57a84f01db6c82eb6c1`

### Why are the changes needed?

The script name for creating the oracle docker image is changed in #32629, following the latest image so we also need to update the corresponding SHA.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The value is from `git log`.

Closes #32630 from sarutak/followup-oracle-script-name.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-05-22 19:52:24 -07:00
Kousuke Saruta 0549caf07a [MINOR][SQL] Change the script name for creating oracle docker image
### What changes were proposed in this pull request?

This PR changes the name in the comment in `jdbc.OracleIntegrationSuite` and `v2.OracleIntegrationSuite`.
The script is for creating oracle docker image.

### Why are the changes needed?

The name of the script is `buildContainerImage`, not `buildDockerImage` now.
- d918f5a4c6 (diff-be303ab32e74192aca829e5ea259a0aec07aac23a6049120fb337ec4efa601b0)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that I can build the image with `./buildContainerImage.sh -v 18.4.0 -x`.

Closes #32629 from sarutak/change-oracle-container-script-name.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-05-21 22:58:44 -07:00
Kousuke Saruta 529b875901 [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
### What changes were proposed in this pull request?

This PR proposes to introduce a new JDBC option `refreshKrb5Config` which allows to reflect the change of `krb5.conf`.

### Why are the changes needed?

In the current master, JDBC datasources can't accept `refreshKrb5Config` which is defined in `Krb5LoginModule`.
So even if we change the `krb5.conf` after establishing a connection, the change will not be reflected.

The similar issue happens when we run multiple `*KrbIntegrationSuites` at the same time.
`MiniKDC` starts and stops every KerberosIntegrationSuite and different port number is recorded to `krb5.conf`.
Due to `SecureConnectionProvider.JDBCConfiguration` doesn't take `refreshKrb5Config`, KerberosIntegrationSuites except the first running one see the wrong port so those suites fail.
You can easily confirm with the following command.
```
build/sbt -Phive Phive-thriftserver -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.*KrbIntegrationSuite"
```
### Does this PR introduce _any_ user-facing change?

Yes. Users can set `refreshKrb5Config` to refresh krb5 relevant configuration.

### How was this patch tested?

New test.

Closes #32344 from sarutak/kerberos-refresh-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
2021-04-29 13:55:53 +09:00
Kousuke Saruta ba92de0ae5 [SPARK-34843][SQL][FOLLOWUP] Fix a test failure in OracleIntegrationSuite
### What changes were proposed in this pull request?

This PR fixes a test failure in `OracleIntegrationSuite`.
After SPARK-34843 (#31965), the way to divide partitions is changed and `OracleIntegrationSuites` is affected.
```
[info] - SPARK-22814 support date/timestamp types in partitionColumn *** FAILED *** (230 milliseconds)
[info]   Set(""D" < '2018-07-11' or "D" is null", ""D" >= '2018-07-11' AND "D" < '2018-07-15'", ""D" >= '2018-07-15'") did not equal Set(""D" < '2018-07-10' or "D" is null", ""D" >= '2018-07-10' AND "D" < '2018-07-14'", ""D" >= '2018-07-14'") (OracleIntegrationSuite.scala:448)
[info]   Analysis:
[info]   Set(missingInLeft: ["D" < '2018-07-10' or "D" is null, "D" >= '2018-07-10' AND "D" < '2018-07-14', "D" >= '2018-07-14'], missingInRight: ["D" < '2018-07-11' or "D" is null, "D" >= '2018-07-11' AND "D" < '2018-07-15', "D" >= '2018-07-15'])
```

### Why are the changes needed?

To follow the previous change.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The modified test.

Closes #32186 from sarutak/fix-oracle-date-error.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-04-15 07:07:34 -07:00
William Hyun c799d049fc [SPARK-34810][TEST] Update PostgreSQL test with the latest results
### What changes were proposed in this pull request?

This PR aims to update `PostgresIntegrationSuite` with the latest results.

### Why are the changes needed?

The latest PostgreSQL jar version is 42.2.19. Since 42.2.9, the test is broken because it returns `0.0` instead of `0.00`.
- https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.19

42.2.9 (2019-12-06)
42.2.10 (2020-01-30)
42.2.11 (2020-03-09)
42.2.12 (2020-03-31)
42.2.13 (2020-06-04)
42.2.14 (2020-06-10)
42.2.15 (2020-08-14)
42.2.16 (2020-08-20)
42.2.17 (2020-10-09)
42.2.18 (2020-10-15)
42.2.19 (2021-02-18)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CI with the updated test cases.

```
build/sbt -Pdocker-integration-tests 'docker-integration-tests/testOnly org.apache.spark.sql.jdbc.PostgresIntegrationSuite'
```

Closes #31910 from williamhyun/pg.

Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
2021-03-21 13:36:45 +08:00
Kousuke Saruta dd6383f0a3 [SPARK-34333][SQL] Fix PostgresDialect to handle money types properly
### What changes were proposed in this pull request?

This PR changes the type mapping for `money` and `money[]`  types for PostgreSQL.
Currently, those types are tried to convert to `DoubleType` and `ArrayType` of `double` respectively.
But the JDBC driver seems not to be able to handle those types properly.

https://github.com/pgjdbc/pgjdbc/issues/100
https://github.com/pgjdbc/pgjdbc/issues/1405

Due to these issue, we can get the error like as follows.

money type.
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.204 executor driver): org.postgresql.util.PSQLException: Bad value for type double : 1,000.00
[info] 	at org.postgresql.jdbc.PgResultSet.toDouble(PgResultSet.java:3104)
[info] 	at org.postgresql.jdbc.PgResultSet.getDouble(PgResultSet.java:2432)
[info] 	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$5(JdbcUtils.scala:418)
```

money[] type.
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.204 executor driver): org.postgresql.util.PSQLException: Bad value for type double : $2,000.00
[info] 	at org.postgresql.jdbc.PgResultSet.toDouble(PgResultSet.java:3104)
[info] 	at org.postgresql.jdbc.ArrayDecoding$5.parseValue(ArrayDecoding.java:235)
[info] 	at org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:122)
[info] 	at org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:764)
[info] 	at org.postgresql.jdbc.PgArray.buildArray(PgArray.java:310)
[info] 	at org.postgresql.jdbc.PgArray.getArrayImpl(PgArray.java:171)
[info] 	at org.postgresql.jdbc.PgArray.getArray(PgArray.java:111)
```

For money type, a known workaround is to treat it as string so this PR do it.
For money[], however, there is no reasonable workaround so this PR remove the support.

### Why are the changes needed?

This is a bug.

### Does this PR introduce _any_ user-facing change?

Yes. As of this PR merged, money type is mapped to `StringType` rather than `DoubleType` and the support for money[] is stopped.
For money type, if the value is less than one thousand,  `$100.00` for instance, it works without this change so I also updated the migration guide because it's a behavior change for such small values.
On the other hand, money[] seems not to work with any value but mentioned in the migration guide just in case.

### How was this patch tested?

New test.

Closes #31442 from sarutak/fix-for-money-type.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
2021-02-17 10:50:06 +09:00
Kousuke Saruta f79305a402 [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
### What changes were proposed in this pull request?

This PR fixes the issue that `PostgresDialect` can't treat arrays of some types.
Though PostgreSQL supports wide range of types (https://www.postgresql.org/docs/13/datatype.html),  the current `PostgresDialect` can't treat arrays of the following types.

* xml
* tsvector
* tsquery
* macaddr
* macaddr8
* txid_snapshot
* pg_snapshot
* point
* line
* lseg
* box
* path
* polygon
* circle
* pg_lsn
* bit varying
* interval

NOTE: PostgreSQL doesn't implement arrays of serial types so this PR doesn't care about them.

### Why are the changes needed?

To provide better support with PostgreSQL.

### Does this PR introduce _any_ user-facing change?

Yes. PostgresDialect can handle arrays of types shown above.

### How was this patch tested?

New test.

Closes #31419 from sarutak/postgres-array-types.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2021-02-10 11:29:14 +09:00
Kousuke Saruta 1f4135c4bd [SPARK-34350][SQL][TESTS] replace withTimeZone defined in OracleIntegrationSuite with DateTimeTestUtils.withDefaultTimeZone
### What changes were proposed in this pull request?

This PR replaces `withTimeZone` defined and used in `OracleIntegrationSuite` with `DateTimeTestUtils.withDefaultTimeZone` which is defined as a utility method.

### Why are the changes needed?

Both methods are semantically the same so it might be better to use the utility one.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`OracleIntegrationSuite` passes.

Closes #31465 from sarutak/oracle-timezone-util.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2021-02-05 23:26:22 +09:00
Kousuke Saruta 55399eb3b2 [SPARK-34343][SQL][TESTS] Add missing test for some non-array types in PostgreSQL
### What changes were proposed in this pull request?

This PR added tests for some non-array types in PostgreSQL.
PostgreSQL supports wide range of types (https://www.postgresql.org/docs/13/datatype.html) and `PostgresIntegrationSuite` contains tests for some types but ones for the following types are missing.

* bit varying
* point
* line
* lseg
* box
* path
* polygon
* circle
* pg_lsn
* macaddr
* macaddr8
* numeric
* pg_snapshot
* real
* time
* timestamp
* tsquery
* tsvector
* txid_snapshot
* xml

NOTE: Handling money types can be buggy so this PR doesn't add tests for those types.

### Why are the changes needed?

To ensure those types work with Spark well.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Extended `PostgresIntegrationSuite`.

Closes #31456 from sarutak/test-for-some-types-postgresql.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2021-02-05 14:27:14 +09:00
Hoa 7675582dab [SPARK-34357][SQL] Map JDBC SQL TIME type to TimestampType with time portion fixed regardless of timezone
### What changes were proposed in this pull request?

Due to user-experience (confusing to Spark users - java.sql.Time using milliseconds vs Spark using microseconds; and user losing useful functions like hour(), minute(), etc on the column), we have decided to revert back to use TimestampType but this time we will enforce the hour to be consistently across system timezone (via offset manipulation) and date part fixed to zero epoch.

Full Discussion with Wenchen Fan Wenchen Fan regarding this ticket is here https://github.com/apache/spark/pull/30902#discussion_r569186823

### Why are the changes needed?

Revert and improvement to sql.Time handling

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests and integration tests

Closes #31473 from saikocat/SPARK-34357.

Authored-by: Hoa <hoameomu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-02-04 17:16:39 +00:00
Kousuke Saruta 144ee9eb30 [SPARK-34186][SQL][TESTS] Fix DockerJDBCIntegrationSuites to reflect the change of SPARK-33888
### What changes were proposed in this pull request?

This PR fixes the following integration suites to reflect the change of SPARK-33888.

* PostgresIntegrationSuite
* MySQLIntegrationSuite
* MsSqlServerIntegrationSuite
* DB2IntegrationSuite

### Why are the changes needed?

Those suites doesn't pass currently in `master` branch.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Those suites pass.

Closes #31274 from sarutak/fix-integration-suites.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-24 18:44:14 -08:00
Kousuke Saruta 45f076336b [SPARK-33813][SQL] Fix the issue that JDBC source can't treat MS SQL Server's spatial types
### What changes were proposed in this pull request?

This PR fixes the issue that reading tables which contain spatial datatypes from MS SQL Server fails.
MS SQL server supports two non-standard spatial JDBC types, `geometry` and  `geography` but Spark SQL can't treat them

```
java.sql.SQLException: Unrecognized SQL type -157
 at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:251)
 at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:321)
 at scala.Option.getOrElse(Option.scala:189)
 at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:321)
 at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
 at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
 at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
 at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:364)
 at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:366)
 at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:355)
 at scala.Option.getOrElse(Option.scala:189)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:355)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
 at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:381)
```

Considering the [data type mapping](https://docs.microsoft.com/ja-jp/sql/connect/jdbc/using-basic-data-types?view=sql-server-ver15) says, I think those spatial types can be mapped to Catalyst's `BinaryType`.

### Why are the changes needed?

To provide better support.

### Does this PR introduce _any_ user-facing change?

Yes. MS SQL Server users can use `geometry` and `geography` types in datasource tables.

### How was this patch tested?

New test case added to `MsSqlServerIntegrationSuite`.

Closes #31283 from sarutak/mssql-spatial-types.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-01-22 04:28:22 +00:00
Kousuke Saruta 842902154a [SPARK-34180][SQL] Fix the regression brought by SPARK-33888 for PostgresDialect
### What changes were proposed in this pull request?

This PR fixes the regression bug brought by SPARK-33888 (#30902).
After that PR merged, `PostgresDIalect#getCatalystType` throws Exception for array types.
```
[info] - Type mapping for various types *** FAILED *** (551 milliseconds)
[info]   java.util.NoSuchElementException: key not found: scale
[info]   at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:106)
[info]   at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:104)
[info]   at org.apache.spark.sql.types.Metadata.get(Metadata.scala:111)
[info]   at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51)
[info]   at org.apache.spark.sql.jdbc.PostgresDialect$.getCatalystType(PostgresDialect.scala:43)
[info]   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:321)
```

### Why are the changes needed?

To fix the regression bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed the test case `SPARK-22291: Conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQL` in `PostgresIntegrationSuite` passed.

I also confirmed whether all the `v2.*IntegrationSuite` pass because this PR changed them and they passed.

Closes #31262 from sarutak/fix-postgres-dialect-regression.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-01-22 13:03:02 +09:00
Kousuke Saruta 8bcebfa59a
[SPARK-33698][BUILD][TESTS] Fix the build error of OracleIntegrationSuite for Scala 2.13
### What changes were proposed in this pull request?

This PR fixes a build error of `OracleIntegrationSuite` with Scala 2.13.

### Why are the changes needed?

Build should pass with Scala 2.13.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that the build pass with the following command.
```
$ build/sbt -Pdocker-integration-tests -Pscala-2.13  "docker-integration-tests/test:compile"
```

Closes #30660 from sarutak/fix-docker-integration-tests-for-scala-2.13.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-07 19:09:59 -08:00
Dongjoon Hyun de9818f043
[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT
### What changes were proposed in this pull request?

This PR aims to update `master` branch version to 3.2.0-SNAPSHOT.

### Why are the changes needed?

Start to prepare Apache Spark 3.2.0.

### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

Pass the CIs.

Closes #30606 from dongjoon-hyun/SPARK-3.2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-04 14:10:42 -08:00
Kousuke Saruta 976e897039
[SPARK-33640][TESTS] Extend connection timeout to DB server for DB2IntegrationSuite and its variants
### What changes were proposed in this pull request?

This PR extends the connection timeout to the DB server for DB2IntegrationSuite and its variants.

The container image ibmcom/db2 creates a database when it starts up.
The database creation can take over 2 minutes.

DB2IntegrationSuite and its variants use the container image but the connection timeout is set to 2 minutes so these suites almost always fail.
### Why are the changes needed?

To pass those suites.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed the suites pass with the following commands.
```
$ build/sbt -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.DB2IntegrationSuite"
$ build/sbt -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.v2.DB2IntegrationSuite"
$ build/sbt -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.DB2KrbIntegrationSuite"

Closes #30583 from sarutak/extend-timeout-for-db2.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-04 00:12:04 -08:00
Kousuke Saruta 91baab77f7
[SPARK-33656][TESTS] Add option to keep container after tests finish for DockerJDBCIntegrationSuites for debug
### What changes were proposed in this pull request?

This PR add an option to keep container after DockerJDBCIntegrationSuites (e.g. DB2IntegrationSuite, PostgresIntegrationSuite) finish.
By setting a system property `spark.test.docker.keepContainer` to `true`, we can use this option.

### Why are the changes needed?

If some error occur during the tests, it would be useful to keep the container for debug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that the container is kept after the test by the following commands.
```
# With sbt
$ build/sbt -Dspark.test.docker.keepContainer=true -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite"

# With Maven
$ build/mvn -Dspark.test.docker.keepContainer=true -Pdocker-integration-tests -Phive -Phive-thriftserver -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite test

$ docker container ls
```

I also confirmed that there are no regression for all the subclasses of `DockerJDBCIntegrationSuite` with sbt/Maven.
* MariaDBKrbIntegrationSuite
* DB2KrbIntegrationSuite
* PostgresKrbIntegrationSuite
* MySQLIntegrationSuite
* PostgresIntegrationSuite
* DB2IntegrationSuite
* MsSqlServerintegrationsuite
* OracleIntegrationSuite
* v2.MySQLIntegrationSuite
* v2.PostgresIntegrationSuite
* v2.DB2IntegrationSuite
* v2.MsSqlServerIntegrationSuite
* v2.OracleIntegrationSuite

NOTE: `DB2IntegrationSuite`, `v2.DB2IntegrationSuite` and `DB2KrbIntegrationSuite` can fail due to the too much short connection timeout. It's a separate issue and I'll fix it in #30583

Closes #30601 from sarutak/keepContainer.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-03 23:47:43 -08:00
Huaxin Gao 15579ba1f8 [SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog
### What changes were proposed in this pull request?
Add namespaces support in JDBC v2 Table Catalog by making ```JDBCTableCatalog``` extends```SupportsNamespaces```

### Why are the changes needed?
make v2 JDBC implementation complete

### Does this PR introduce _any_ user-facing change?
Yes. Add the following to  ```JDBCTableCatalog```

- listNamespaces
- listNamespaces(String[] namespace)
- namespaceExists(String[] namespace)
- loadNamespaceMetadata(String[] namespace)
- createNamespace
- alterNamespace
- dropNamespace

### How was this patch tested?
Add new docker tests

Closes #30473 from huaxingao/name_space.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-12-04 07:23:35 +00:00
Huaxin Gao e22ddb6740 [SPARK-32405][SQL][FOLLOWUP] Remove USING _ in CREATE TABLE in JDBCTableCatalog docker tests
### What changes were proposed in this pull request?
remove USING _ in CREATE TABLE in JDBCTableCatalog docker tests

### Why are the changes needed?
Previously CREATE TABLE syntax forces users to specify a provider so we have to add a USING _ . Now the problem was fix and we need to remove it.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests

Closes #30599 from huaxingao/remove_USING.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-12-04 05:43:05 +00:00
Kousuke Saruta cf98a761de [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationSuite
### What changes were proposed in this pull request?

This PR changes mariadb_docker_entrypoint.sh to set the proper version automatically for mariadb-plugin-gssapi-server.
The proper version is based on the one of mariadb-server.
Also, this PR enables to use arbitrary docker image by setting the environment variable `MARIADB_CONTAINER_IMAGE_NAME`.

### Why are the changes needed?

For `MariaDBKrbIntegrationSuite`, the version of `mariadb-plugin-gssapi-server` is currently set to `10.5.5` in `mariadb_docker_entrypoint.sh` but it's no longer available in the official apt repository and `MariaDBKrbIntegrationSuite` doesn't pass for now.
It seems that only the most recent three versions are available for each major version and they are `10.5.6`, `10.5.7` and `10.5.8` for now.
Further, the release cycle of MariaDB seems to be very rapid (1 ~ 2 months) so I don't think it's a good idea to set to an specific version for `mariadb-plugin-gssapi-server`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Confirmed that `MariaDBKrbIntegrationSuite` passes with the following commands.
```
$  build/sbt -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite"
```
In this case, we can see what version of `mariadb-plugin-gssapi-server` is going to be installed in the following container log message.
```
Installing mariadb-plugin-gssapi-server=1:10.5.8+maria~focal
```

Or, we can set MARIADB_CONTAINER_IMAGE_NAME for a specific version of MariaDB.
```
$ MARIADB_DOCKER_IMAGE_NAME=mariadb:10.5.6 build/sbt -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite"
```
```
Installing mariadb-plugin-gssapi-server=1:10.5.6+maria~focal
```

Closes #30515 from sarutak/fix-MariaDBKrbIntegrationSuite.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-11-28 23:38:11 +09:00
Huaxin Gao a1a3d5cb02
[MINOR][TESTS][DOCS] Use fully-qualified class name in docker integration test
### What changes were proposed in this pull request?
change
```
./build/sbt -Pdocker-integration-tests "testOnly *xxxIntegrationSuite"
```
to
```
./build/sbt -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.xxxIntegrationSuite"
```

### Why are the changes needed?
We only want to start v1 ```xxxIntegrationSuite```, not the newly added```v2.xxxIntegrationSuite```.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually checked

Closes #30448 from huaxingao/dockertest.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-20 10:14:37 -08:00
Josh Soref 9d58a2f0f0 [MINOR][GRAPHX] Correct typos in the sub-modules: graphx, external, and examples
### What changes were proposed in this pull request?

This PR intends to fix typos in the sub-modules: graphx, external, and examples.
Split per holdenk https://github.com/apache/spark/pull/30323#issuecomment-725159710

NOTE: The misspellings have been reported at 706a726f87 (commitcomment-44064356)

### Why are the changes needed?

Misspelled words make it harder to read / understand content.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No testing was performed

Closes #30326 from jsoref/spelling-graphx.

Authored-by: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-11-12 08:29:22 +09:00
Huaxin Gao bfb257f078 [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog
### What changes were proposed in this pull request?
Currently in JDBCTableCatalog, we ignore the table options when creating table.
```
    // TODO (SPARK-32405): Apply table options while creating tables in JDBC Table Catalog
    if (!properties.isEmpty) {
      logWarning("Cannot create JDBC table with properties, these properties will be " +
        "ignored: " + properties.asScala.map { case (k, v) => s"$k=$v" }.mkString("[", ", ", "]"))
    }
```

### Why are the changes needed?
need to apply the table options when we create table

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
add new test

Closes #30154 from huaxingao/table_options.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-11-09 07:02:14 +00:00
yangjie01 02fd52cfbc [SPARK-33352][CORE][SQL][SS][MLLIB][AVRO][K8S] Fix procedure-like declaration compilation warnings in Scala 2.13
### What changes were proposed in this pull request?
There are two similar compilation warnings about procedure-like declaration in Scala 2.13:

```
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: procedure syntax is deprecated for constructors: add `=`, as in method definition
```
and

```
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `run`'s return type
```

this pr is the first part to resolve SPARK-33352:

- For constructors method definition add `=` to convert to function syntax

- For without `return type` methods definition add `: Unit =` to convert to function syntax

### Why are the changes needed?
Eliminate compilation warnings in Scala 2.13 and this change should be compatible with Scala 2.12

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #30255 from LuciferYang/SPARK-29392-FOLLOWUP.1.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-11-08 12:51:48 -06:00
Prashant Sharma 733a468726 [SPARK-33130][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MsSqlServer dialect)
### What changes were proposed in this pull request?

Override the default SQL strings for:
ALTER TABLE RENAME COLUMN
ALTER TABLE UPDATE COLUMN NULLABILITY
in the following MsSQLServer JDBC dialect according to official documentation.
Write MsSqlServer integration tests for JDBC.

### Why are the changes needed?

To add the support for alter table when interacting with MSSql Server.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

added tests

Closes #30038 from ScrapCodes/mssql-dialect.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-11-06 05:46:38 +00:00
Kousuke Saruta 0b557b3290 [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
### What changes were proposed in this pull request?

This PR renames some part of `Seq` in `PostgresIntegrationSuite` to `scala.collection.Seq`.
When I run `docker-integration-test`, I noticed that `PostgresIntegrationSuite` failed due to `ClassCastException`.
The reason is the same as what is resolved in SPARK-29292.

### Why are the changes needed?

To pass `docker-integration-test` for Scala 2.13.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran `PostgresIntegrationSuite` fixed and confirmed it successfully finished.

Closes #30166 from sarutak/fix-toseq-postgresql.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-11-04 17:39:06 +09:00
Prashant Sharma 6226ccc092 [SPARK-33095] Follow up, support alter table column rename
### What changes were proposed in this pull request?

Support rename column for mysql dialect.

### Why are the changes needed?

At the moment, it does not work for mysql version 5.x. So, we should throw proper exception for that case.

### Does this PR introduce _any_ user-facing change?

Yes, `column rename` with mysql dialect should work correctly.

### How was this patch tested?

Added tests for rename column.
Ran the tests to pass with both versions of mysql.

* `export MYSQL_DOCKER_IMAGE_NAME=mysql:5.7.31`

* `export MYSQL_DOCKER_IMAGE_NAME=mysql:8.0`

Closes #30142 from ScrapCodes/mysql-dialect-rename.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-11-02 05:03:41 +00:00
Huaxin Gao f284218dae [SPARK-33137][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Postgres dialect)
### What changes were proposed in this pull request?
Override the default SQL strings in Postgres Dialect for:

- ALTER TABLE UPDATE COLUMN TYPE
- ALTER TABLE UPDATE COLUMN NULLABILITY

Add new docker integration test suite `jdbc/v2/PostgreSQLIntegrationSuite.scala`

### Why are the changes needed?
supports Postgres specific ALTER TABLE syntax.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add new test `PostgreSQLIntegrationSuite`

Closes #30089 from huaxingao/postgres_docker.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-10-27 15:04:53 +00:00
Prashant Sharma 8cae7f88b0 [SPARK-33095][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)
### What changes were proposed in this pull request?

Override the default SQL strings for:
ALTER TABLE UPDATE COLUMN TYPE
ALTER TABLE UPDATE COLUMN NULLABILITY
in the following MySQL JDBC dialect according to official documentation.
Write MySQL integration tests for JDBC.

### Why are the changes needed?
Improved code coverage and support mysql dialect for jdbc.

### Does this PR introduce _any_ user-facing change?

Yes, Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

### How was this patch tested?

Added tests.

Closes #30025 from ScrapCodes/mysql-dialect.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-10-22 13:51:42 +00:00