spark-instrumented-optimizer/sql/core/benchmarks/DateTimeRebaseBenchmark-results.txt
Max Gekk bef5828e12 [SPARK-31630][SQL] Fix perf regression by skipping timestamps rebasing after some threshold
### What changes were proposed in this pull request?
Skip timestamps rebasing after a global threshold when there is no difference between Julian and Gregorian calendars. This allows to avoid checking hash maps of switch points, and fixes perf regressions in `toJavaTimestamp()` and `fromJavaTimestamp()`.

### Why are the changes needed?
The changes fix perf regressions of conversions to/from external type `java.sql.Timestamp`.

Before (see the PR's results https://github.com/apache/spark/pull/28440):
```
================================================================================================
Conversion from/to external types
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2  2.50GHz
To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
From java.sql.Timestamp                             376            388          10         13.3          75.2       1.1X
Collect java.sql.Timestamp                         1878           1937          64          2.7         375.6       0.2X
```

After:
```
================================================================================================
Conversion from/to external types
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2  2.50GHz
To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
From java.sql.Timestamp                             249            264          24         20.1          49.8       1.7X
Collect java.sql.Timestamp                         1503           1523          24          3.3         300.5       0.3X
```

Perf improvements in average of:

1. From java.sql.Timestamp is ~ 34%
2. To java.sql.Timestamps is ~16%

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By existing test suites `DateTimeUtilsSuite` and `RebaseDateTimeSuite`.

Closes #28441 from MaxGekk/opt-rebase-common-threshold.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-05-05 14:11:53 +00:00

143 lines
13 KiB
Plaintext

================================================================================================
Rebasing dates/timestamps in Parquet datasource
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save DATE to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 23567 23567 0 4.2 235.7 1.0X
before 1582, noop 10570 10570 0 9.5 105.7 2.2X
after 1582, rebase off 35335 35335 0 2.8 353.3 0.7X
after 1582, rebase on 35645 35645 0 2.8 356.5 0.7X
before 1582, rebase off 21824 21824 0 4.6 218.2 1.1X
before 1582, rebase on 22532 22532 0 4.4 225.3 1.0X
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load DATE from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off, rebase off 13194 13266 81 7.6 131.9 1.0X
after 1582, vec off, rebase on 13402 13466 89 7.5 134.0 1.0X
after 1582, vec on, rebase off 3627 3657 29 27.6 36.3 3.6X
after 1582, vec on, rebase on 3818 3839 26 26.2 38.2 3.5X
before 1582, vec off, rebase off 13075 13146 115 7.6 130.7 1.0X
before 1582, vec off, rebase on 13794 13804 13 7.2 137.9 1.0X
before 1582, vec on, rebase off 3655 3675 21 27.4 36.6 3.6X
before 1582, vec on, rebase on 4579 4634 72 21.8 45.8 2.9X
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save TIMESTAMP_INT96 to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, noop 2671 2671 0 37.4 26.7 1.0X
before 1900, noop 2685 2685 0 37.2 26.8 1.0X
after 1900, rebase off 23899 23899 0 4.2 239.0 0.1X
after 1900, rebase on 24030 24030 0 4.2 240.3 0.1X
before 1900, rebase off 30178 30178 0 3.3 301.8 0.1X
before 1900, rebase on 30127 30127 0 3.3 301.3 0.1X
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load TIMESTAMP_INT96 from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, vec off, rebase off 16613 16685 75 6.0 166.1 1.0X
after 1900, vec off, rebase on 16487 16541 47 6.1 164.9 1.0X
after 1900, vec on, rebase off 8840 8870 49 11.3 88.4 1.9X
after 1900, vec on, rebase on 8795 8813 20 11.4 87.9 1.9X
before 1900, vec off, rebase off 20400 20441 62 4.9 204.0 0.8X
before 1900, vec off, rebase on 20430 20481 60 4.9 204.3 0.8X
before 1900, vec on, rebase off 12211 12290 73 8.2 122.1 1.4X
before 1900, vec on, rebase on 12231 12321 95 8.2 122.3 1.4X
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save TIMESTAMP_MICROS to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, noop 2836 2836 0 35.3 28.4 1.0X
before 1900, noop 2812 2812 0 35.6 28.1 1.0X
after 1900, rebase off 15976 15976 0 6.3 159.8 0.2X
after 1900, rebase on 16197 16197 0 6.2 162.0 0.2X
before 1900, rebase off 16140 16140 0 6.2 161.4 0.2X
before 1900, rebase on 20410 20410 0 4.9 204.1 0.1X
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load TIMESTAMP_MICROS from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, vec off, rebase off 15297 15324 40 6.5 153.0 1.0X
after 1900, vec off, rebase on 15771 15832 59 6.3 157.7 1.0X
after 1900, vec on, rebase off 4922 4949 32 20.3 49.2 3.1X
after 1900, vec on, rebase on 5392 5411 17 18.5 53.9 2.8X
before 1900, vec off, rebase off 15227 15385 141 6.6 152.3 1.0X
before 1900, vec off, rebase on 19611 19658 41 5.1 196.1 0.8X
before 1900, vec on, rebase off 4965 5013 54 20.1 49.6 3.1X
before 1900, vec on, rebase on 9847 9873 43 10.2 98.5 1.6X
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save TIMESTAMP_MILLIS to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, noop 2818 2818 0 35.5 28.2 1.0X
before 1900, noop 2805 2805 0 35.6 28.1 1.0X
after 1900, rebase off 15182 15182 0 6.6 151.8 0.2X
after 1900, rebase on 15614 15614 0 6.4 156.1 0.2X
before 1900, rebase off 15404 15404 0 6.5 154.0 0.2X
before 1900, rebase on 19747 19747 0 5.1 197.5 0.1X
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load TIMESTAMP_MILLIS from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, vec off, rebase off 15622 15649 24 6.4 156.2 1.0X
after 1900, vec off, rebase on 15572 15677 119 6.4 155.7 1.0X
after 1900, vec on, rebase off 6345 6358 15 15.8 63.5 2.5X
after 1900, vec on, rebase on 6780 6834 92 14.8 67.8 2.3X
before 1900, vec off, rebase off 15540 15584 38 6.4 155.4 1.0X
before 1900, vec off, rebase on 19590 19653 55 5.1 195.9 0.8X
before 1900, vec on, rebase off 6374 6381 10 15.7 63.7 2.5X
before 1900, vec on, rebase on 10530 10544 25 9.5 105.3 1.5X
================================================================================================
Rebasing dates/timestamps in ORC datasource
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save DATE to ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 23825 23825 0 4.2 238.2 1.0X
before 1582, noop 10501 10501 0 9.5 105.0 2.3X
after 1582 32134 32134 0 3.1 321.3 0.7X
before 1582 19947 19947 0 5.0 199.5 1.2X
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load DATE from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off 10034 10056 22 10.0 100.3 1.0X
after 1582, vec on 3664 3698 30 27.3 36.6 2.7X
before 1582, vec off 10472 10502 30 9.5 104.7 1.0X
before 1582, vec on 4052 4098 42 24.7 40.5 2.5X
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save TIMESTAMP to ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, noop 2812 2812 0 35.6 28.1 1.0X
before 1900, noop 2801 2801 0 35.7 28.0 1.0X
after 1900 18290 18290 0 5.5 182.9 0.2X
before 1900 22344 22344 0 4.5 223.4 0.1X
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load TIMESTAMP from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1900, vec off 11257 11279 32 8.9 112.6 1.0X
after 1900, vec on 5296 5310 15 18.9 53.0 2.1X
before 1900, vec off 14700 14758 72 6.8 147.0 0.8X
before 1900, vec on 8576 8665 150 11.7 85.8 1.3X