spark-instrumented-optimizer/sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt
Maxim Gekk a1dbcd13a3 [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
### What changes were proposed in this pull request?
In the PR, I propose to add new benchmark `DateTimeRebaseBenchmark` which should measure the performance of rebasing of dates/timestamps from/to to the hybrid calendar (Julian+Gregorian) to/from Proleptic Gregorian calendar:
1. In write, it saves separately dates and timestamps before and after 1582 year w/ and w/o rebasing.
2. In read, it loads previously saved parquet files by vectorized reader and by regular reader.

Here is the summary of benchmarking:
- Saving timestamps is **~6 times slower**
- Loading timestamps w/ vectorized **off** is **~4 times slower**
- Loading timestamps w/ vectorized **on** is **~10 times slower**

### Why are the changes needed?
To know the impact of date-time rebasing introduced by #27915, #27953, #27807.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Run the `DateTimeRebaseBenchmark` benchmark using Amazon EC2:

| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK8/11 |

Closes #28057 from MaxGekk/rebase-bechmark.

Lead-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-03-30 16:46:31 +08:00

54 lines
5 KiB
Plaintext

================================================================================================
Rebasing dates/timestamps in Parquet datasource
================================================================================================
OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1058-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save dates to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 9272 9272 0 10.8 92.7 1.0X
before 1582, noop 9142 9142 0 10.9 91.4 1.0X
after 1582, rebase off 21841 21841 0 4.6 218.4 0.4X
after 1582, rebase on 58245 58245 0 1.7 582.4 0.2X
before 1582, rebase off 19813 19813 0 5.0 198.1 0.5X
before 1582, rebase on 63737 63737 0 1.6 637.4 0.1X
OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1058-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load dates from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off, rebase off 13004 13063 67 7.7 130.0 1.0X
after 1582, vec off, rebase on 36224 36253 26 2.8 362.2 0.4X
after 1582, vec on, rebase off 3596 3654 54 27.8 36.0 3.6X
after 1582, vec on, rebase on 26144 26253 112 3.8 261.4 0.5X
before 1582, vec off, rebase off 12872 12914 51 7.8 128.7 1.0X
before 1582, vec off, rebase on 37762 37904 153 2.6 377.6 0.3X
before 1582, vec on, rebase off 3522 3592 94 28.4 35.2 3.7X
before 1582, vec on, rebase on 27580 27615 59 3.6 275.8 0.5X
OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1058-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save timestamps to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 3113 3113 0 32.1 31.1 1.0X
before 1582, noop 3078 3078 0 32.5 30.8 1.0X
after 1582, rebase off 15749 15749 0 6.3 157.5 0.2X
after 1582, rebase on 69106 69106 0 1.4 691.1 0.0X
before 1582, rebase off 15967 15967 0 6.3 159.7 0.2X
before 1582, rebase on 76843 76843 0 1.3 768.4 0.0X
OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1058-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load timestamps from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off, rebase off 15070 15172 94 6.6 150.7 1.0X
after 1582, vec off, rebase on 43748 43867 157 2.3 437.5 0.3X
after 1582, vec on, rebase off 4805 4859 60 20.8 48.1 3.1X
after 1582, vec on, rebase on 33960 34027 61 2.9 339.6 0.4X
before 1582, vec off, rebase off 15037 15071 52 6.7 150.4 1.0X
before 1582, vec off, rebase on 44590 44749 156 2.2 445.9 0.3X
before 1582, vec on, rebase off 4831 4852 30 20.7 48.3 3.1X
before 1582, vec on, rebase on 35460 35481 18 2.8 354.6 0.4X