[SPARK-32683][DOCS][SQL] Fix doc error and add migration guide for datetime pattern F

### What changes were proposed in this pull request?

This PR fixes the doc error and add a migration guide for datetime pattern.

### Why are the changes needed?
This is a bug of the doc that we inherited from JDK https://bugs.openjdk.java.net/browse/JDK-8169482

The SimpleDateFormatter(**F Day of week in month**) we used in 2.x and the DatetimeFormatter(**F week-of-month**) we use now both have the opposite meanings to what they declared in the java docs. And unfortunately, this also leads to silent data change in Spark too.

The `week-of-month` is actually the pattern `W` in DatetimeFormatter, which is banned to use in Spark 3.x.

If we want to keep pattern `F`, we need to accept the behavior change with proper migration guide and fix the doc in Spark

### Does this PR introduce _any_ user-facing change?

Yes, doc changed

### How was this patch tested?

passing ci doc generating job

Closes #29538 from yaooqinn/SPARK-32683.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This commit is contained in:
Kent Yao 2020-08-25 13:17:03 +00:00 committed by Wenchen Fan
parent b78b776c9e
commit 1f3bb51757
2 changed files with 3 additions and 1 deletions

View file

@ -191,6 +191,8 @@ license: |
- Since Spark 3.0, when using `EXTRACT` expression to extract the second field from date/timestamp values, the result will be a `DecimalType(8, 6)` value with 2 digits for second part, and 6 digits for the fractional part with microsecond precision. e.g. `extract(second from to_timestamp('2019-09-20 10:10:10.1'))` results `10.100000`. In Spark version 2.4 and earlier, it returns an `IntegerType` value and the result for the former example is `10`.
- In Spark 3.0, datetime pattern letter `F` is **aligned day of week in month** that represents the concept of the count of days within the period of a week where the weeks are aligned to the start of the month. In Spark version 2.4 and earlier, it is **week of month** that represents the concept of the count of weeks within the month where weeks start on a fixed day-of-week, e.g. `2020-07-30` is 30 days (4 weeks and 2 days) after the first day of the month, so `date_format(date '2020-07-30', 'F')` returns 2 in Spark 3.0, but as a week count in Spark 2.x, it returns 5 because it locates in the 5th week of July 2020, where week one is 2020-07-01 to 07-04.
### Data Sources
- In Spark version 2.4 and below, when reading a Hive SerDe table with Spark native data sources(parquet/orc), Spark infers the actual file schema and update the table schema in metastore. In Spark 3.0, Spark doesn't infer the schema anymore. This should not cause any problems to end users, but if it does, set `spark.sql.hive.caseSensitiveInferenceMode` to `INFER_AND_SAVE`.

View file

@ -37,7 +37,7 @@ Spark uses pattern letters in the following table for date and timestamp parsing
|**d**|day-of-month|number(3)|28|
|**Q/q**|quarter-of-year|number/text|3; 03; Q3; 3rd quarter|
|**E**|day-of-week|text|Tue; Tuesday|
|**F**|week-of-month|number(1)|3|
|**F**|aligned day of week in month|number(1)|3|
|**a**|am-pm-of-day|am-pm|PM|
|**h**|clock-hour-of-am-pm (1-12)|number(2)|12|
|**K**|hour-of-am-pm (0-11)|number(2)|0|