bef5828e12
### What changes were proposed in this pull request? Skip timestamps rebasing after a global threshold when there is no difference between Julian and Gregorian calendars. This allows to avoid checking hash maps of switch points, and fixes perf regressions in `toJavaTimestamp()` and `fromJavaTimestamp()`. ### Why are the changes needed? The changes fix perf regressions of conversions to/from external type `java.sql.Timestamp`. Before (see the PR's results https://github.com/apache/spark/pull/28440): ``` ================================================================================================ Conversion from/to external types ================================================================================================ OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 2.50GHz To/from Java's date-time: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Timestamp 376 388 10 13.3 75.2 1.1X Collect java.sql.Timestamp 1878 1937 64 2.7 375.6 0.2X ``` After: ``` ================================================================================================ Conversion from/to external types ================================================================================================ OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws Intel(R) Xeon(R) CPU E5-2670 v2 2.50GHz To/from Java's date-time: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ From java.sql.Timestamp 249 264 24 20.1 49.8 1.7X Collect java.sql.Timestamp 1503 1523 24 3.3 300.5 0.3X ``` Perf improvements in average of: 1. From java.sql.Timestamp is ~ 34% 2. To java.sql.Timestamps is ~16% ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By existing test suites `DateTimeUtilsSuite` and `RebaseDateTimeSuite`. Closes #28441 from MaxGekk/opt-rebase-common-threshold. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
143 lines
13 KiB
Plaintext
143 lines
13 KiB
Plaintext
================================================================================================
|
|
Rebasing dates/timestamps in Parquet datasource
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save DATE to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, noop 23567 23567 0 4.2 235.7 1.0X
|
|
before 1582, noop 10570 10570 0 9.5 105.7 2.2X
|
|
after 1582, rebase off 35335 35335 0 2.8 353.3 0.7X
|
|
after 1582, rebase on 35645 35645 0 2.8 356.5 0.7X
|
|
before 1582, rebase off 21824 21824 0 4.6 218.2 1.1X
|
|
before 1582, rebase on 22532 22532 0 4.4 225.3 1.0X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load DATE from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, vec off, rebase off 13194 13266 81 7.6 131.9 1.0X
|
|
after 1582, vec off, rebase on 13402 13466 89 7.5 134.0 1.0X
|
|
after 1582, vec on, rebase off 3627 3657 29 27.6 36.3 3.6X
|
|
after 1582, vec on, rebase on 3818 3839 26 26.2 38.2 3.5X
|
|
before 1582, vec off, rebase off 13075 13146 115 7.6 130.7 1.0X
|
|
before 1582, vec off, rebase on 13794 13804 13 7.2 137.9 1.0X
|
|
before 1582, vec on, rebase off 3655 3675 21 27.4 36.6 3.6X
|
|
before 1582, vec on, rebase on 4579 4634 72 21.8 45.8 2.9X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save TIMESTAMP_INT96 to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1900, noop 2671 2671 0 37.4 26.7 1.0X
|
|
before 1900, noop 2685 2685 0 37.2 26.8 1.0X
|
|
after 1900, rebase off 23899 23899 0 4.2 239.0 0.1X
|
|
after 1900, rebase on 24030 24030 0 4.2 240.3 0.1X
|
|
before 1900, rebase off 30178 30178 0 3.3 301.8 0.1X
|
|
before 1900, rebase on 30127 30127 0 3.3 301.3 0.1X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load TIMESTAMP_INT96 from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1900, vec off, rebase off 16613 16685 75 6.0 166.1 1.0X
|
|
after 1900, vec off, rebase on 16487 16541 47 6.1 164.9 1.0X
|
|
after 1900, vec on, rebase off 8840 8870 49 11.3 88.4 1.9X
|
|
after 1900, vec on, rebase on 8795 8813 20 11.4 87.9 1.9X
|
|
before 1900, vec off, rebase off 20400 20441 62 4.9 204.0 0.8X
|
|
before 1900, vec off, rebase on 20430 20481 60 4.9 204.3 0.8X
|
|
before 1900, vec on, rebase off 12211 12290 73 8.2 122.1 1.4X
|
|
before 1900, vec on, rebase on 12231 12321 95 8.2 122.3 1.4X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save TIMESTAMP_MICROS to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1900, noop 2836 2836 0 35.3 28.4 1.0X
|
|
before 1900, noop 2812 2812 0 35.6 28.1 1.0X
|
|
after 1900, rebase off 15976 15976 0 6.3 159.8 0.2X
|
|
after 1900, rebase on 16197 16197 0 6.2 162.0 0.2X
|
|
before 1900, rebase off 16140 16140 0 6.2 161.4 0.2X
|
|
before 1900, rebase on 20410 20410 0 4.9 204.1 0.1X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load TIMESTAMP_MICROS from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1900, vec off, rebase off 15297 15324 40 6.5 153.0 1.0X
|
|
after 1900, vec off, rebase on 15771 15832 59 6.3 157.7 1.0X
|
|
after 1900, vec on, rebase off 4922 4949 32 20.3 49.2 3.1X
|
|
after 1900, vec on, rebase on 5392 5411 17 18.5 53.9 2.8X
|
|
before 1900, vec off, rebase off 15227 15385 141 6.6 152.3 1.0X
|
|
before 1900, vec off, rebase on 19611 19658 41 5.1 196.1 0.8X
|
|
before 1900, vec on, rebase off 4965 5013 54 20.1 49.6 3.1X
|
|
before 1900, vec on, rebase on 9847 9873 43 10.2 98.5 1.6X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save TIMESTAMP_MILLIS to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1900, noop 2818 2818 0 35.5 28.2 1.0X
|
|
before 1900, noop 2805 2805 0 35.6 28.1 1.0X
|
|
after 1900, rebase off 15182 15182 0 6.6 151.8 0.2X
|
|
after 1900, rebase on 15614 15614 0 6.4 156.1 0.2X
|
|
before 1900, rebase off 15404 15404 0 6.5 154.0 0.2X
|
|
before 1900, rebase on 19747 19747 0 5.1 197.5 0.1X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load TIMESTAMP_MILLIS from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1900, vec off, rebase off 15622 15649 24 6.4 156.2 1.0X
|
|
after 1900, vec off, rebase on 15572 15677 119 6.4 155.7 1.0X
|
|
after 1900, vec on, rebase off 6345 6358 15 15.8 63.5 2.5X
|
|
after 1900, vec on, rebase on 6780 6834 92 14.8 67.8 2.3X
|
|
before 1900, vec off, rebase off 15540 15584 38 6.4 155.4 1.0X
|
|
before 1900, vec off, rebase on 19590 19653 55 5.1 195.9 0.8X
|
|
before 1900, vec on, rebase off 6374 6381 10 15.7 63.7 2.5X
|
|
before 1900, vec on, rebase on 10530 10544 25 9.5 105.3 1.5X
|
|
|
|
|
|
================================================================================================
|
|
Rebasing dates/timestamps in ORC datasource
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save DATE to ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, noop 23825 23825 0 4.2 238.2 1.0X
|
|
before 1582, noop 10501 10501 0 9.5 105.0 2.3X
|
|
after 1582 32134 32134 0 3.1 321.3 0.7X
|
|
before 1582 19947 19947 0 5.0 199.5 1.2X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load DATE from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, vec off 10034 10056 22 10.0 100.3 1.0X
|
|
after 1582, vec on 3664 3698 30 27.3 36.6 2.7X
|
|
before 1582, vec off 10472 10502 30 9.5 104.7 1.0X
|
|
before 1582, vec on 4052 4098 42 24.7 40.5 2.5X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save TIMESTAMP to ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1900, noop 2812 2812 0 35.6 28.1 1.0X
|
|
before 1900, noop 2801 2801 0 35.7 28.0 1.0X
|
|
after 1900 18290 18290 0 5.5 182.9 0.2X
|
|
before 1900 22344 22344 0 4.5 223.4 0.1X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load TIMESTAMP from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1900, vec off 11257 11279 32 8.9 112.6 1.0X
|
|
after 1900, vec on 5296 5310 15 18.9 53.0 2.1X
|
|
before 1900, vec off 14700 14758 72 6.8 147.0 0.8X
|
|
before 1900, vec on 8576 8665 150 11.7 85.8 1.3X
|
|
|
|
|