a1dbcd13a3
### What changes were proposed in this pull request? In the PR, I propose to add new benchmark `DateTimeRebaseBenchmark` which should measure the performance of rebasing of dates/timestamps from/to to the hybrid calendar (Julian+Gregorian) to/from Proleptic Gregorian calendar: 1. In write, it saves separately dates and timestamps before and after 1582 year w/ and w/o rebasing. 2. In read, it loads previously saved parquet files by vectorized reader and by regular reader. Here is the summary of benchmarking: - Saving timestamps is **~6 times slower** - Loading timestamps w/ vectorized **off** is **~4 times slower** - Loading timestamps w/ vectorized **on** is **~10 times slower** ### Why are the changes needed? To know the impact of date-time rebasing introduced by #27915, #27953, #27807. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Run the `DateTimeRebaseBenchmark` benchmark using Amazon EC2: | Item | Description | | ---- | ----| | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK8/11 | Closes #28057 from MaxGekk/rebase-bechmark. Lead-authored-by: Maxim Gekk <max.gekk@gmail.com> Co-authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
54 lines
5 KiB
Plaintext
54 lines
5 KiB
Plaintext
================================================================================================
|
|
Rebasing dates/timestamps in Parquet datasource
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1058-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save dates to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, noop 9272 9272 0 10.8 92.7 1.0X
|
|
before 1582, noop 9142 9142 0 10.9 91.4 1.0X
|
|
after 1582, rebase off 21841 21841 0 4.6 218.4 0.4X
|
|
after 1582, rebase on 58245 58245 0 1.7 582.4 0.2X
|
|
before 1582, rebase off 19813 19813 0 5.0 198.1 0.5X
|
|
before 1582, rebase on 63737 63737 0 1.6 637.4 0.1X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1058-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load dates from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, vec off, rebase off 13004 13063 67 7.7 130.0 1.0X
|
|
after 1582, vec off, rebase on 36224 36253 26 2.8 362.2 0.4X
|
|
after 1582, vec on, rebase off 3596 3654 54 27.8 36.0 3.6X
|
|
after 1582, vec on, rebase on 26144 26253 112 3.8 261.4 0.5X
|
|
before 1582, vec off, rebase off 12872 12914 51 7.8 128.7 1.0X
|
|
before 1582, vec off, rebase on 37762 37904 153 2.6 377.6 0.3X
|
|
before 1582, vec on, rebase off 3522 3592 94 28.4 35.2 3.7X
|
|
before 1582, vec on, rebase on 27580 27615 59 3.6 275.8 0.5X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1058-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save timestamps to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, noop 3113 3113 0 32.1 31.1 1.0X
|
|
before 1582, noop 3078 3078 0 32.5 30.8 1.0X
|
|
after 1582, rebase off 15749 15749 0 6.3 157.5 0.2X
|
|
after 1582, rebase on 69106 69106 0 1.4 691.1 0.0X
|
|
before 1582, rebase off 15967 15967 0 6.3 159.7 0.2X
|
|
before 1582, rebase on 76843 76843 0 1.3 768.4 0.0X
|
|
|
|
OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1058-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load timestamps from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, vec off, rebase off 15070 15172 94 6.6 150.7 1.0X
|
|
after 1582, vec off, rebase on 43748 43867 157 2.3 437.5 0.3X
|
|
after 1582, vec on, rebase off 4805 4859 60 20.8 48.1 3.1X
|
|
after 1582, vec on, rebase on 33960 34027 61 2.9 339.6 0.4X
|
|
before 1582, vec off, rebase off 15037 15071 52 6.7 150.4 1.0X
|
|
before 1582, vec off, rebase on 44590 44749 156 2.2 445.9 0.3X
|
|
before 1582, vec on, rebase off 4831 4852 30 20.7 48.3 3.1X
|
|
before 1582, vec on, rebase on 35460 35481 18 2.8 354.6 0.4X
|
|
|
|
|