[SPARK-35446] Override getJDBCType in MySQLDialect to map FloatType to FLOAT
### What changes were proposed in this pull request? Override `getJDBCType` method in `MySQLDialect` so that `FloatType` is mapped to `FLOAT` instead of `REAL` ### Why are the changes needed? MySQL treats `REAL` as a synonym to `DOUBLE` by default (see https://dev.mysql.com/doc/refman/8.0/en/numeric-types.html). Therefore, when creating a table with a column of `REAL` type, it will be created as `DOUBLE`. However, currently, `MySQLDialect` does not provide an implementation for `getJDBCType`, and will thus ultimately fall back to `JdbcUtils.getCommonJDBCType`, which maps `FloatType` to `REAL`. This change is needed so that we can properly map the `FloatType` to `FLOAT` for MySQL. ### Does this PR introduce _any_ user-facing change? Prior to this PR, when writing a dataframe with a `FloatType` column to a MySQL table, it will create a `DOUBLE` column. After the PR, it will create a `FLOAT` column. ### How was this patch tested? Added a test case in `JDBCSuite` that verifies the mapping. Closes #32605 from mariosmeim-db/SPARK-35446. Authored-by: Marios Meimaris <marios.meimaris@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This commit is contained in:
parent
f2c0a049a6
commit
b5678bee1e
|
@ -93,6 +93,8 @@ license: |
|
||||||
|
|
||||||
- In Spark 3.2, special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow`, and `now` are supported in typed literals only, for instance, `select timestamp'now'`. In Spark 3.1 and 3.0, such special values are supported in any casts of strings to dates/timestamps. To keep these special values as dates/timestamps in Spark 3.1 and 3.0, you should replace them manually, e.g. `if (c in ('now', 'today'), current_date(), cast(c as date))`.
|
- In Spark 3.2, special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow`, and `now` are supported in typed literals only, for instance, `select timestamp'now'`. In Spark 3.1 and 3.0, such special values are supported in any casts of strings to dates/timestamps. To keep these special values as dates/timestamps in Spark 3.1 and 3.0, you should replace them manually, e.g. `if (c in ('now', 'today'), current_date(), cast(c as date))`.
|
||||||
|
|
||||||
|
- In Spark 3.2, `FloatType` is mapped to `FLOAT` in MySQL. Prior to this, it used to be mapped to `REAL`, which is by default a synonym to `DOUBLE PRECISION` in MySQL.
|
||||||
|
|
||||||
## Upgrading from Spark SQL 3.0 to 3.1
|
## Upgrading from Spark SQL 3.0 to 3.1
|
||||||
|
|
||||||
- In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.
|
- In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.
|
||||||
|
|
|
@ -20,7 +20,8 @@ package org.apache.spark.sql.jdbc
|
||||||
import java.sql.{SQLFeatureNotSupportedException, Types}
|
import java.sql.{SQLFeatureNotSupportedException, Types}
|
||||||
import java.util.Locale
|
import java.util.Locale
|
||||||
|
|
||||||
import org.apache.spark.sql.types.{BooleanType, DataType, LongType, MetadataBuilder}
|
import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
|
||||||
|
import org.apache.spark.sql.types.{BooleanType, DataType, FloatType, LongType, MetadataBuilder}
|
||||||
|
|
||||||
private case object MySQLDialect extends JdbcDialect {
|
private case object MySQLDialect extends JdbcDialect {
|
||||||
|
|
||||||
|
@ -94,4 +95,11 @@ private case object MySQLDialect extends JdbcDialect {
|
||||||
override def getTableCommentQuery(table: String, comment: String): String = {
|
override def getTableCommentQuery(table: String, comment: String): String = {
|
||||||
s"ALTER TABLE $table COMMENT = '$comment'"
|
s"ALTER TABLE $table COMMENT = '$comment'"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
|
||||||
|
// See SPARK-35446: MySQL treats REAL as a synonym to DOUBLE by default
|
||||||
|
// We override getJDBCType so that FloatType is mapped to FLOAT instead
|
||||||
|
case FloatType => Option(JdbcType("FLOAT", java.sql.Types.FLOAT))
|
||||||
|
case _ => JdbcUtils.getCommonJDBCType(dt)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -899,6 +899,11 @@ class JDBCSuite extends QueryTest
|
||||||
Option(TimestampType))
|
Option(TimestampType))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
test("SPARK-35446: MySQLDialect type mapping of float") {
|
||||||
|
val mySqlDialect = JdbcDialects.get("jdbc:mysql://127.0.0.1/db")
|
||||||
|
assert(mySqlDialect.getJDBCType(FloatType).map(_.databaseTypeDefinition).get == "FLOAT")
|
||||||
|
}
|
||||||
|
|
||||||
test("PostgresDialect type mapping") {
|
test("PostgresDialect type mapping") {
|
||||||
val Postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db")
|
val Postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db")
|
||||||
val md = new MetadataBuilder().putLong("scale", 0)
|
val md = new MetadataBuilder().putLong("scale", 0)
|
||||||
|
|
Loading…
Reference in a new issue