spark-instrumented-optimizer/sql/core/benchmarks
Maxim Gekk bb0b416f0b [SPARK-31297][SQL] Speed up dates rebasing
### What changes were proposed in this pull request?
In the PR, I propose to replace current implementation of the `rebaseGregorianToJulianDays()` and `rebaseJulianToGregorianDays()` functions in `DateTimeUtils` by new one which is based on the fact that difference between Proleptic Gregorian and the hybrid (Julian+Gregorian) calendars was changed only 14 times for entire supported range of valid dates `[0001-01-01, 9999-12-31]`:

| date | Proleptic Greg. days | Hybrid (Julian+Greg) days | diff|
| ---- | ----|----|----|
|0001-01-01|-719162|-719164|-2|
|0100-03-01|-682944|-682945|-1|
|0200-03-01|-646420|-646420|0|
|0300-03-01|-609896|-609895|1|
|0500-03-01|-536847|-536845|2|
|0600-03-01|-500323|-500320|3|
|0700-03-01|-463799|-463795|4|
|0900-03-01|-390750|-390745|5|
|1000-03-01|-354226|-354220|6|
|1100-03-01|-317702|-317695|7|
|1300-03-01|-244653|-244645|8|
|1400-03-01|-208129|-208120|9|
|1500-03-01|-171605|-171595|10|
|1582-10-15|-141427|-141427|0|

For the given days since the epoch, the proposed implementation finds the range of days which the input days belongs to, and adds the diff in days between calendars to the input. The result is rebased days since the epoch in the target calendar.

For example, if need to rebase -650000 days from Proleptic Gregorian calendar to the hybrid calendar. In that case, the input falls to the bucket [-682944, -646420), the diff associated with the range is -1. To get the rebased days in Julian calendar, we should add -1 to -650000, and the result is -650001.

### Why are the changes needed?
To make dates rebasing faster.

### Does this PR introduce any user-facing change?
No, the results should be the same for valid range of the `DATE` type `[0001-01-01, 9999-12-31]`.

### How was this patch tested?
- Added 2 tests to `DateTimeUtilsSuite` for the `rebaseGregorianToJulianDays()` and `rebaseJulianToGregorianDays()` functions. The tests check that results of old and new implementation (optimized version) are the same for all supported dates.
- Re-run `DateTimeRebaseBenchmark` on:

| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK8/11 |

Closes #28067 from MaxGekk/optimize-rebasing.

Lead-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-03-31 17:38:47 +08:00
..
AggregateBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
AggregateBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
BloomFilterBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
BloomFilterBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
BuiltInDataSourceWriteBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
BuiltInDataSourceWriteBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
ColumnarBatchBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
ColumnarBatchBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
CompressionSchemeBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
CompressionSchemeBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
CSVBenchmark-jdk11-results.txt [SPARK-30323][SQL] Support filters pushdown in CSV datasource 2020-01-16 13:10:08 +09:00
CSVBenchmark-results.txt [SPARK-30323][SQL] Support filters pushdown in CSV datasource 2020-01-16 13:10:08 +09:00
DatasetBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
DatasetBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
DataSourceReadBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
DataSourceReadBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
DateTimeBenchmark-jdk11-results.txt [SPARK-31150][SQL] Parsing seconds fraction with variable length for timestamp 2020-03-17 21:53:46 +08:00
DateTimeBenchmark-results.txt [SPARK-31150][SQL] Parsing seconds fraction with variable length for timestamp 2020-03-17 21:53:46 +08:00
DateTimeRebaseBenchmark-jdk11-results.txt [SPARK-31297][SQL] Speed up dates rebasing 2020-03-31 17:38:47 +08:00
DateTimeRebaseBenchmark-results.txt [SPARK-31297][SQL] Speed up dates rebasing 2020-03-31 17:38:47 +08:00
ExternalAppendOnlyUnsafeRowArrayBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
ExtractBenchmark-jdk11-results.txt [SPARK-31119][SQL] Add interval value support for extract expression as extract source 2020-03-18 12:29:39 +08:00
ExtractBenchmark-results.txt [SPARK-31119][SQL] Add interval value support for extract expression as extract source 2020-03-18 12:29:39 +08:00
FilterPushdownBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
FilterPushdownBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
HashedRelationMetricsBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
HashedRelationMetricsBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
InExpressionBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
InExpressionBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
IntervalBenchmark-jdk11-results.txt [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark 2020-03-12 12:59:29 -07:00
IntervalBenchmark-results.txt [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark 2020-03-12 12:59:29 -07:00
JoinBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
JoinBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
JsonBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
JsonBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
MakeDateTimeBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
MakeDateTimeBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
MetricsAggregationBenchmark-jdk11-results.txt [SPARK-29562][SQL] Speed up and slim down metric aggregation in SQL listener 2019-10-24 22:18:10 -07:00
MetricsAggregationBenchmark-results.txt [SPARK-29562][SQL] Speed up and slim down metric aggregation in SQL listener 2019-10-24 22:18:10 -07:00
MiscBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
MiscBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
OrcNestedSchemaPruningBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
OrcNestedSchemaPruningBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
OrcV2NestedSchemaPruningBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
OrcV2NestedSchemaPruningBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
ParquetNestedSchemaPruningBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
ParquetNestedSchemaPruningBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
PrimitiveArrayBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
PrimitiveArrayBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
RangeBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
RangeBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
SortBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
SortBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
TPCDSQueryBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
TPCDSQueryBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
UDFBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
UDFBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
UnsafeArrayDataBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
UnsafeArrayDataBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
WideSchemaBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
WideSchemaBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
WideTableBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
WideTableBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00