spark-instrumented-optimizer/sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt
Max Gekk bef5828e12 [SPARK-31630][SQL] Fix perf regression by skipping timestamps rebasing after some threshold
### What changes were proposed in this pull request?
Skip timestamps rebasing after a global threshold when there is no difference between Julian and Gregorian calendars. This allows to avoid checking hash maps of switch points, and fixes perf regressions in `toJavaTimestamp()` and `fromJavaTimestamp()`.

### Why are the changes needed?
The changes fix perf regressions of conversions to/from external type `java.sql.Timestamp`.

Before (see the PR's results https://github.com/apache/spark/pull/28440):
```
================================================================================================
Conversion from/to external types
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2  2.50GHz
To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
From java.sql.Timestamp                             376            388          10         13.3          75.2       1.1X
Collect java.sql.Timestamp                         1878           1937          64          2.7         375.6       0.2X
```

After:
```
================================================================================================
Conversion from/to external types
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2  2.50GHz
To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
From java.sql.Timestamp                             249            264          24         20.1          49.8       1.7X
Collect java.sql.Timestamp                         1503           1523          24          3.3         300.5       0.3X
```

Perf improvements in average of:

1. From java.sql.Timestamp is ~ 34%
2. To java.sql.Timestamps is ~16%

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By existing test suites `DateTimeUtilsSuite` and `RebaseDateTimeSuite`.

Closes #28441 from MaxGekk/opt-rebase-common-threshold.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-05-05 14:11:53 +00:00

143 lines
13 KiB
Plaintext

================================================================================================
Rebasing dates/timestamps in Parquet datasource
================================================================================================
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save DATE to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 20802 20802 0 4.8 208.0 1.0X
before 1582, noop 10728 10728 0 9.3 107.3 1.9X
after 1582, rebase off 32924 32924 0 3.0 329.2 0.6X
after 1582, rebase on 32627 32627 0 3.1 326.3 0.6X
before 1582, rebase off 21576 21576 0 4.6 215.8 1.0X
before 1582, rebase on 23115 23115 0 4.3 231.2 0.9X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load DATE from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off, rebase off 12880 12984 178 7.8 128.8 1.0X
after 1582, vec off, rebase on 13118 13255 174 7.6 131.2 1.0X
after 1582, vec on, rebase off 3645 3698 76 27.4 36.4 3.5X
after 1582, vec on, rebase on 3709 3727 15 27.0 37.1 3.5X
before 1582, vec off, rebase off 13014 13051 36 7.7 130.1 1.0X
before 1582, vec off, rebase on 14195 14242 48 7.0 142.0 0.9X
before 1582, vec on, rebase off 3680 3773 92 27.2 36.8 3.5X
before 1582, vec on, rebase on 4310 4381 87 23.2 43.1 3.0X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save TIMESTAMP_INT96 to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, noop 3026 3026 0 33.1 30.3 1.0X
before 1900, noop 2995 2995 0 33.4 30.0 1.0X
after 1900, rebase off 24294 24294 0 4.1 242.9 0.1X
after 1900, rebase on 24480 24480 0 4.1 244.8 0.1X
before 1900, rebase off 31120 31120 0 3.2 311.2 0.1X
before 1900, rebase on 31201 31201 0 3.2 312.0 0.1X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load TIMESTAMP_INT96 from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, vec off, rebase off 18283 18309 39 5.5 182.8 1.0X
after 1900, vec off, rebase on 18235 18269 53 5.5 182.4 1.0X
after 1900, vec on, rebase off 9563 9589 27 10.5 95.6 1.9X
after 1900, vec on, rebase on 9463 9554 81 10.6 94.6 1.9X
before 1900, vec off, rebase off 21377 21469 118 4.7 213.8 0.9X
before 1900, vec off, rebase on 21265 21422 156 4.7 212.7 0.9X
before 1900, vec on, rebase off 12481 12524 46 8.0 124.8 1.5X
before 1900, vec on, rebase on 12360 12482 105 8.1 123.6 1.5X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save TIMESTAMP_MICROS to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, noop 2984 2984 0 33.5 29.8 1.0X
before 1900, noop 3003 3003 0 33.3 30.0 1.0X
after 1900, rebase off 15814 15814 0 6.3 158.1 0.2X
after 1900, rebase on 16250 16250 0 6.2 162.5 0.2X
before 1900, rebase off 16026 16026 0 6.2 160.3 0.2X
before 1900, rebase on 19735 19735 0 5.1 197.3 0.2X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load TIMESTAMP_MICROS from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, vec off, rebase off 15292 15351 57 6.5 152.9 1.0X
after 1900, vec off, rebase on 15753 15886 173 6.3 157.5 1.0X
after 1900, vec on, rebase off 4879 4923 52 20.5 48.8 3.1X
after 1900, vec on, rebase on 5018 5038 18 19.9 50.2 3.0X
before 1900, vec off, rebase off 15257 15311 53 6.6 152.6 1.0X
before 1900, vec off, rebase on 18459 18537 90 5.4 184.6 0.8X
before 1900, vec on, rebase off 4929 4946 15 20.3 49.3 3.1X
before 1900, vec on, rebase on 8254 8339 93 12.1 82.5 1.9X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save TIMESTAMP_MILLIS to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, noop 2987 2987 0 33.5 29.9 1.0X
before 1900, noop 3002 3002 0 33.3 30.0 1.0X
after 1900, rebase off 15215 15215 0 6.6 152.1 0.2X
after 1900, rebase on 15577 15577 0 6.4 155.8 0.2X
before 1900, rebase off 15505 15505 0 6.4 155.1 0.2X
before 1900, rebase on 19143 19143 0 5.2 191.4 0.2X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load TIMESTAMP_MILLIS from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, vec off, rebase off 15330 15436 113 6.5 153.3 1.0X
after 1900, vec off, rebase on 15515 15549 30 6.4 155.1 1.0X
after 1900, vec on, rebase off 6056 6074 19 16.5 60.6 2.5X
after 1900, vec on, rebase on 6376 6390 14 15.7 63.8 2.4X
before 1900, vec off, rebase off 15490 15523 36 6.5 154.9 1.0X
before 1900, vec off, rebase on 18613 18685 118 5.4 186.1 0.8X
before 1900, vec on, rebase off 6065 6109 41 16.5 60.6 2.5X
before 1900, vec on, rebase on 9052 9082 32 11.0 90.5 1.7X
================================================================================================
Rebasing dates/timestamps in ORC datasource
================================================================================================
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save DATE to ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 20653 20653 0 4.8 206.5 1.0X
before 1582, noop 10707 10707 0 9.3 107.1 1.9X
after 1582 28288 28288 0 3.5 282.9 0.7X
before 1582 19196 19196 0 5.2 192.0 1.1X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load DATE from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off 10596 10621 37 9.4 106.0 1.0X
after 1582, vec on 3886 3938 61 25.7 38.9 2.7X
before 1582, vec off 10955 10984 26 9.1 109.6 1.0X
before 1582, vec on 4236 4258 24 23.6 42.4 2.5X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save TIMESTAMP to ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, noop 2988 2988 0 33.5 29.9 1.0X
before 1900, noop 3007 3007 0 33.3 30.1 1.0X
after 1900 18082 18082 0 5.5 180.8 0.2X
before 1900 22669 22669 0 4.4 226.7 0.1X
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load TIMESTAMP from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, vec off 12029 12035 9 8.3 120.3 1.0X
after 1900, vec on 5194 5197 3 19.3 51.9 2.3X
before 1900, vec off 14853 14875 23 6.7 148.5 0.8X
before 1900, vec on 7797 7836 60 12.8 78.0 1.5X