f620996142
### What changes were proposed in this pull request? In the PR, I propose to use the `CAST` logic when the pattern is not specified in `DateFormatter` or `TimestampFormatter`. In particular, invoke the `DateTimeUtils.stringToTimestampAnsi()` or `stringToDateAnsi()` in the case. ### Why are the changes needed? 1. This can improve user experience with Spark SQL by making the default date/timestamp parsers more flexible and tolerant to their inputs. 2. We make the default case consistent to the behavior of the `CAST` expression which makes implementation more consistent. ### Does this PR introduce _any_ user-facing change? The changes shouldn't introduce behavior change in regular cases but it can influence on corner cases. New implementation is able to parse more dates/timestamps by default. For instance, old (current) date parses can recognize dates only in the format **yyyy-MM-dd** but new one can handle: * `[+-]yyyy*` * `[+-]yyyy*-[m]m` * `[+-]yyyy*-[m]m-[d]d` * `[+-]yyyy*-[m]m-[d]d ` * `[+-]yyyy*-[m]m-[d]d *` * `[+-]yyyy*-[m]m-[d]dT*` Similarly for timestamps. The old (current) timestamp formatter is able to parse timestamps only in the format **yyyy-MM-dd HH:mm:ss** by default, but new implementation can handle: * `[+-]yyyy*` * `[+-]yyyy*-[m]m` * `[+-]yyyy*-[m]m-[d]d` * `[+-]yyyy*-[m]m-[d]d ` * `[+-]yyyy*-[m]m-[d]d [h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]` * `[+-]yyyy*-[m]m-[d]dT[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]` * `[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]` * `T[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]` ### How was this patch tested? By running the affected test suites: ``` $ build/sbt "test:testOnly *ImageFileFormatSuite" $ build/sbt "test:testOnly *ParquetV2PartitionDiscoverySuite" ``` Closes #33709 from MaxGekk/datetime-cast-default-pattern. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |