spark-instrumented-optimizer/sql/core/benchmarks
Wenchen Fan f72220b8ab [SPARK-31606][SQL] Reduce the perf regression of vectorized parquet reader caused by datetime rebase
### What changes were proposed in this pull request?

Push the rebase logic to the lower level of the parquet vectorized reader, to make the final code more vectorization-friendly.

### Why are the changes needed?

Parquet vectorized reader is carefully implemented, to make it more likely to be vectorized by the JVM. However, the newly added datetime rebase degrade the performance a lot, as it breaks vectorization, even if the datetime values don't need to rebase (this is very likely as dates before 1582 is rare).

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

Run part of the `DateTimeRebaseBenchmark` locally. The results:
before this patch
```
[info] Load dates from parquet:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] after 1582, vec on, rebase off                     2677           2838         142         37.4          26.8       1.0X
[info] after 1582, vec on, rebase on                      3828           4331         805         26.1          38.3       0.7X
[info] before 1582, vec on, rebase off                    2903           2926          34         34.4          29.0       0.9X
[info] before 1582, vec on, rebase on                     4163           4197          38         24.0          41.6       0.6X

[info] Load timestamps from parquet:             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] after 1900, vec on, rebase off                     3537           3627         104         28.3          35.4       1.0X
[info] after 1900, vec on, rebase on                      6891           7010         105         14.5          68.9       0.5X
[info] before 1900, vec on, rebase off                    3692           3770          72         27.1          36.9       1.0X
[info] before 1900, vec on, rebase on                     7588           7610          30         13.2          75.9       0.5X
```

After this patch
```
[info] Load dates from parquet:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] after 1582, vec on, rebase off                     2758           2944         197         36.3          27.6       1.0X
[info] after 1582, vec on, rebase on                      2908           2966          51         34.4          29.1       0.9X
[info] before 1582, vec on, rebase off                    2840           2878          37         35.2          28.4       1.0X
[info] before 1582, vec on, rebase on                     3407           3433          24         29.4          34.1       0.8X

[info] Load timestamps from parquet:             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] after 1900, vec on, rebase off                     3861           4003         139         25.9          38.6       1.0X
[info] after 1900, vec on, rebase on                      4194           4283          77         23.8          41.9       0.9X
[info] before 1900, vec on, rebase off                    3849           3937          79         26.0          38.5       1.0X
[info] before 1900, vec on, rebase on                     7512           7546          55         13.3          75.1       0.5X
```

Date type is 30% faster if the values don't need to rebase, 20% faster if need to rebase.
Timestamp type is 60% faster if the values don't need to rebase, no difference if need to rebase.

Closes #28406 from cloud-fan/perf.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-05-04 15:30:10 +09:00
..
AggregateBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
AggregateBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
BloomFilterBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
BloomFilterBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
BuiltInDataSourceWriteBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
BuiltInDataSourceWriteBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
ColumnarBatchBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
ColumnarBatchBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
CompressionSchemeBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
CompressionSchemeBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
CSVBenchmark-jdk11-results.txt [SPARK-31414][SQL] Fix performance regression with new TimestampFormatter for json and csv time parsing 2020-04-13 03:11:28 +00:00
CSVBenchmark-results.txt [SPARK-31414][SQL] Fix performance regression with new TimestampFormatter for json and csv time parsing 2020-04-13 03:11:28 +00:00
DatasetBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
DatasetBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
DataSourceReadBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
DataSourceReadBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
DateTimeBenchmark-jdk11-results.txt [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in DateTimeBenchmark 2020-05-04 09:39:50 +09:00
DateTimeBenchmark-results.txt [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in DateTimeBenchmark 2020-05-04 09:39:50 +09:00
DateTimeRebaseBenchmark-jdk11-results.txt [SPARK-31606][SQL] Reduce the perf regression of vectorized parquet reader caused by datetime rebase 2020-05-04 15:30:10 +09:00
DateTimeRebaseBenchmark-results.txt [SPARK-31606][SQL] Reduce the perf regression of vectorized parquet reader caused by datetime rebase 2020-05-04 15:30:10 +09:00
ExternalAppendOnlyUnsafeRowArrayBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
ExtractBenchmark-jdk11-results.txt [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function 2020-04-22 10:24:49 +00:00
ExtractBenchmark-results.txt [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function 2020-04-22 10:24:49 +00:00
FilterPushdownBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
FilterPushdownBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
HashedRelationMetricsBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
HashedRelationMetricsBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
InExpressionBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
InExpressionBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
IntervalBenchmark-jdk11-results.txt [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark 2020-03-12 12:59:29 -07:00
IntervalBenchmark-results.txt [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark 2020-03-12 12:59:29 -07:00
JoinBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
JoinBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
JsonBenchmark-jdk11-results.txt [SPARK-31414][SQL] Fix performance regression with new TimestampFormatter for json and csv time parsing 2020-04-13 03:11:28 +00:00
JsonBenchmark-results.txt [SPARK-31414][SQL] Fix performance regression with new TimestampFormatter for json and csv time parsing 2020-04-13 03:11:28 +00:00
MakeDateTimeBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
MakeDateTimeBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
MetricsAggregationBenchmark-jdk11-results.txt [SPARK-29562][SQL] Speed up and slim down metric aggregation in SQL listener 2019-10-24 22:18:10 -07:00
MetricsAggregationBenchmark-results.txt [SPARK-29562][SQL] Speed up and slim down metric aggregation in SQL listener 2019-10-24 22:18:10 -07:00
MiscBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
MiscBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
OrcNestedSchemaPruningBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
OrcNestedSchemaPruningBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
OrcV2NestedSchemaPruningBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
OrcV2NestedSchemaPruningBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
ParquetNestedPredicatePushDownBenchmark-jdk11-results.txt [SPARK-31364][SQL][TESTS] Benchmark Parquet Nested Field Predicate Pushdown 2020-04-24 22:10:58 +00:00
ParquetNestedPredicatePushDownBenchmark-results.txt [SPARK-31364][SQL][TESTS] Benchmark Parquet Nested Field Predicate Pushdown 2020-04-24 22:10:58 +00:00
ParquetNestedSchemaPruningBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
ParquetNestedSchemaPruningBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
PrimitiveArrayBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
PrimitiveArrayBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
RangeBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
RangeBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
SortBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
SortBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
TPCDSQueryBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
TPCDSQueryBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
UDFBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
UDFBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
UnsafeArrayDataBenchmark-jdk11-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
UnsafeArrayDataBenchmark-results.txt [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) 2019-10-03 08:58:25 -07:00
WideSchemaBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
WideSchemaBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
WideTableBenchmark-jdk11-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00
WideTableBenchmark-results.txt [SPARK-30409][SPARK-29173][SQL][TESTS] Use NoOp datasource in SQL benchmarks 2020-01-12 13:18:19 -08:00