[SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables
### What changes were proposed in this pull request? Fix `DaysWritable` by overriding parent's method `def get(doesTimeMatter: Boolean): Date` from `DateWritable` instead of `Date get()` because the former one uses the first one. The bug occurs because `HiveOutputWriter.write()` call `def get(doesTimeMatter: Boolean): Date` transitively with default implementation from the parent class `DateWritable` which doesn't respect date rebases and uses not initialized `daysSinceEpoch` (0 which `1970-01-01`). ### Why are the changes needed? The changes fix the bug: ```sql spark-sql> CREATE TABLE table1 (d date); spark-sql> INSERT INTO table1 VALUES (date '2020-08-11'); spark-sql> SELECT * FROM table1; 1970-01-01 ``` The expected result of the last SQL statement must be **2020-08-11** but got **1970-01-01**. ### Does this PR introduce _any_ user-facing change? Yes. After the fix, `INSERT` work correctly: ```sql spark-sql> SELECT * FROM table1; 2020-08-11 ``` ### How was this patch tested? Add new test to `HiveSerDeReadWriteSuite` Closes #29409 from MaxGekk/insert-date-into-hive-table. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
parent
5d130f0360
commit
0477d23467
|
@ -54,7 +54,9 @@ class DaysWritable(
|
|||
}
|
||||
|
||||
override def getDays: Int = julianDays
|
||||
override def get(): Date = new Date(DateWritable.daysToMillis(julianDays))
|
||||
override def get(doesTimeMatter: Boolean): Date = {
|
||||
new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter))
|
||||
}
|
||||
|
||||
override def set(d: Int): Unit = {
|
||||
gregorianDays = d
|
||||
|
|
|
@ -184,4 +184,12 @@ class HiveSerDeReadWriteSuite extends QueryTest with SQLTestUtils with TestHiveS
|
|||
checkComplexTypes(fileFormat)
|
||||
}
|
||||
}
|
||||
|
||||
test("SPARK-32594: insert dates to a Hive table") {
|
||||
withTable("table1") {
|
||||
sql("CREATE TABLE table1 (d date)")
|
||||
sql("INSERT INTO table1 VALUES (date '2020-08-11')")
|
||||
checkAnswer(spark.table("table1"), Row(Date.valueOf("2020-08-11")))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
Loading…
Reference in a new issue