spark-instrumented-optimizer/external
Maxim Gekk c5323d2e8d [SPARK-31318][SQL] Split Parquet/Avro configs for rebasing dates/timestamps in read and in write
### What changes were proposed in this pull request?
In the PR, I propose to replace the following SQL configs:
1.  `spark.sql.legacy.parquet.rebaseDateTime.enabled` by
    - `spark.sql.legacy.parquet.rebaseDateTimeInWrite.enabled` (`false` by default). The config enables rebasing dates/timestamps while saving to Parquet files. If it is set to `true`, dates/timestamps are converted to local date-time in Proleptic Gregorian calendar, date-time fields are extracted, and used in building new local date-time in the hybrid calendar (Julian + Gregorian). The resulted local date-time is converted to days or microseconds since the epoch.
    - `spark.sql.legacy.parquet.rebaseDateTimeInRead.enabled` (`false` by default). The config enables rebasing of dates/timestamps in reading from Parquet files.
2. `spark.sql.legacy.avro.rebaseDateTime.enabled` by
    - `spark.sql.legacy.avro.rebaseDateTimeInWrite.enabled` (`false` by default). It enables dates/timestamps rebasing from Proleptic Gregorian calendar to the hybrid calendar via local date/timestamps.
    - `spark.sql.legacy.avro.rebaseDateTimeInRead.enabled` (`false` by default).  It enables rebasing dates/timestamps from the hybrid calendar to Proleptic Gregorian calendar in read. The rebasing is performed by converting micros/millis/days to a local date/timestamp in the source calendar, interpreting the resulted date/timestamp in the target calendar, and getting the number of micros/millis/days since the epoch 1970-01-01 00:00:00Z.

### Why are the changes needed?
This allows to load dates/timestamps saved by Spark 2.4, and save to Parquet/Avro files without rebasing. And the reverse use case - load data saved by Spark 3.0, and save it in the form which is compatible with Spark 2.4.

### Does this PR introduce any user-facing change?
Yes, users have to use new SQL configs. Old SQL configs are removed by the PR.

### How was this patch tested?
By existing test suites `AvroV1Suite`, `AvroV2Suite` and `ParquetIOSuite`.

Closes #28082 from MaxGekk/split-rebase-configs.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-04-01 04:56:05 +00:00
..
avro [SPARK-31318][SQL] Split Parquet/Avro configs for rebasing dates/timestamps in read and in write 2020-04-01 04:56:05 +00:00
docker [SPARK-28683][BUILD] Upgrade Scala to 2.12.10 2019-09-18 13:30:36 -07:00
docker-integration-tests [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1 2020-03-15 23:55:04 -07:00
kafka-0-10 [SPARK-31228][DSTREAMS] Add version information to the configuration of Kafka 2020-03-26 20:11:15 +09:00
kafka-0-10-assembly [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00
kafka-0-10-sql [SPARK-31228][DSTREAMS] Add version information to the configuration of Kafka 2020-03-26 20:11:15 +09:00
kafka-0-10-token-provider [SPARK-30874][SQL] Support Postgres Kerberos login in JDBC connector 2020-03-12 19:04:35 -07:00
kinesis-asl [SPARK-31293][DSTREAMS][KINESIS][DOC] Fix wrong examples and help messages for Kinesis integration 2020-03-29 14:27:19 -07:00
kinesis-asl-assembly [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00
spark-ganglia-lgpl [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00