bb0b416f0b
### What changes were proposed in this pull request? In the PR, I propose to replace current implementation of the `rebaseGregorianToJulianDays()` and `rebaseJulianToGregorianDays()` functions in `DateTimeUtils` by new one which is based on the fact that difference between Proleptic Gregorian and the hybrid (Julian+Gregorian) calendars was changed only 14 times for entire supported range of valid dates `[0001-01-01, 9999-12-31]`: | date | Proleptic Greg. days | Hybrid (Julian+Greg) days | diff| | ---- | ----|----|----| |0001-01-01|-719162|-719164|-2| |0100-03-01|-682944|-682945|-1| |0200-03-01|-646420|-646420|0| |0300-03-01|-609896|-609895|1| |0500-03-01|-536847|-536845|2| |0600-03-01|-500323|-500320|3| |0700-03-01|-463799|-463795|4| |0900-03-01|-390750|-390745|5| |1000-03-01|-354226|-354220|6| |1100-03-01|-317702|-317695|7| |1300-03-01|-244653|-244645|8| |1400-03-01|-208129|-208120|9| |1500-03-01|-171605|-171595|10| |1582-10-15|-141427|-141427|0| For the given days since the epoch, the proposed implementation finds the range of days which the input days belongs to, and adds the diff in days between calendars to the input. The result is rebased days since the epoch in the target calendar. For example, if need to rebase -650000 days from Proleptic Gregorian calendar to the hybrid calendar. In that case, the input falls to the bucket [-682944, -646420), the diff associated with the range is -1. To get the rebased days in Julian calendar, we should add -1 to -650000, and the result is -650001. ### Why are the changes needed? To make dates rebasing faster. ### Does this PR introduce any user-facing change? No, the results should be the same for valid range of the `DATE` type `[0001-01-01, 9999-12-31]`. ### How was this patch tested? - Added 2 tests to `DateTimeUtilsSuite` for the `rebaseGregorianToJulianDays()` and `rebaseJulianToGregorianDays()` functions. The tests check that results of old and new implementation (optimized version) are the same for all supported dates. - Re-run `DateTimeRebaseBenchmark` on: | Item | Description | | ---- | ----| | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK8/11 | Closes #28067 from MaxGekk/optimize-rebasing. Lead-authored-by: Maxim Gekk <max.gekk@gmail.com> Co-authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> |
||
---|---|---|
.. | ||
AggregateBenchmark-jdk11-results.txt | ||
AggregateBenchmark-results.txt | ||
BloomFilterBenchmark-jdk11-results.txt | ||
BloomFilterBenchmark-results.txt | ||
BuiltInDataSourceWriteBenchmark-jdk11-results.txt | ||
BuiltInDataSourceWriteBenchmark-results.txt | ||
ColumnarBatchBenchmark-jdk11-results.txt | ||
ColumnarBatchBenchmark-results.txt | ||
CompressionSchemeBenchmark-jdk11-results.txt | ||
CompressionSchemeBenchmark-results.txt | ||
CSVBenchmark-jdk11-results.txt | ||
CSVBenchmark-results.txt | ||
DatasetBenchmark-jdk11-results.txt | ||
DatasetBenchmark-results.txt | ||
DataSourceReadBenchmark-jdk11-results.txt | ||
DataSourceReadBenchmark-results.txt | ||
DateTimeBenchmark-jdk11-results.txt | ||
DateTimeBenchmark-results.txt | ||
DateTimeRebaseBenchmark-jdk11-results.txt | ||
DateTimeRebaseBenchmark-results.txt | ||
ExternalAppendOnlyUnsafeRowArrayBenchmark-jdk11-results.txt | ||
ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt | ||
ExtractBenchmark-jdk11-results.txt | ||
ExtractBenchmark-results.txt | ||
FilterPushdownBenchmark-jdk11-results.txt | ||
FilterPushdownBenchmark-results.txt | ||
HashedRelationMetricsBenchmark-jdk11-results.txt | ||
HashedRelationMetricsBenchmark-results.txt | ||
InExpressionBenchmark-jdk11-results.txt | ||
InExpressionBenchmark-results.txt | ||
IntervalBenchmark-jdk11-results.txt | ||
IntervalBenchmark-results.txt | ||
JoinBenchmark-jdk11-results.txt | ||
JoinBenchmark-results.txt | ||
JsonBenchmark-jdk11-results.txt | ||
JsonBenchmark-results.txt | ||
MakeDateTimeBenchmark-jdk11-results.txt | ||
MakeDateTimeBenchmark-results.txt | ||
MetricsAggregationBenchmark-jdk11-results.txt | ||
MetricsAggregationBenchmark-results.txt | ||
MiscBenchmark-jdk11-results.txt | ||
MiscBenchmark-results.txt | ||
OrcNestedSchemaPruningBenchmark-jdk11-results.txt | ||
OrcNestedSchemaPruningBenchmark-results.txt | ||
OrcV2NestedSchemaPruningBenchmark-jdk11-results.txt | ||
OrcV2NestedSchemaPruningBenchmark-results.txt | ||
ParquetNestedSchemaPruningBenchmark-jdk11-results.txt | ||
ParquetNestedSchemaPruningBenchmark-results.txt | ||
PrimitiveArrayBenchmark-jdk11-results.txt | ||
PrimitiveArrayBenchmark-results.txt | ||
RangeBenchmark-jdk11-results.txt | ||
RangeBenchmark-results.txt | ||
SortBenchmark-jdk11-results.txt | ||
SortBenchmark-results.txt | ||
TPCDSQueryBenchmark-jdk11-results.txt | ||
TPCDSQueryBenchmark-results.txt | ||
UDFBenchmark-jdk11-results.txt | ||
UDFBenchmark-results.txt | ||
UnsafeArrayDataBenchmark-jdk11-results.txt | ||
UnsafeArrayDataBenchmark-results.txt | ||
WideSchemaBenchmark-jdk11-results.txt | ||
WideSchemaBenchmark-results.txt | ||
WideTableBenchmark-jdk11-results.txt | ||
WideTableBenchmark-results.txt |