91af87d34e
### What changes were proposed in this pull request? In the PR, I propose to add new benchmarks to `DateTimeRebaseBenchmark` for saving and loading dates/timestamps to/from ORC files. I extracted common code from the benchmark for Parquet datasource and place it to the methods `caseName()` and `getPath()`. Added benchmarks for ORC save/load dates before and after 1582-10-15 because an implementation may have different performance for dates before the Julian calendar cutover day, see #28067 as an example. ### Why are the changes needed? To have the base line for future optimizations of `fromJavaDate()`/`toJavaDate()` and `toJavaTimestamp()`/`fromJavaTimestamp()` in `DateTimeUtils`. The methods are used while saving/loading dates/timestamps by ORC datasource. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By running the updated benchmark `DateTimeRebaseBenchmark` via the command: ``` SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.DateTimeRebaseBenchmark" ``` in the environment: | Item | Description | | ---- | ----| | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 1.8.0_242-8u242/11.0.6+10 | Closes #28076 from MaxGekk/rebase-benchmark-orc. Lead-authored-by: Max Gekk <max.gekk@gmail.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
95 lines
8.6 KiB
Plaintext
95 lines
8.6 KiB
Plaintext
================================================================================================
|
|
Rebasing dates/timestamps in Parquet datasource
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save dates to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, noop 9691 9691 0 10.3 96.9 1.0X
|
|
before 1582, noop 9024 9024 0 11.1 90.2 1.1X
|
|
after 1582, rebase off 21195 21195 0 4.7 211.9 0.5X
|
|
after 1582, rebase on 20045 20045 0 5.0 200.4 0.5X
|
|
before 1582, rebase off 20039 20039 0 5.0 200.4 0.5X
|
|
before 1582, rebase on 20451 20451 0 4.9 204.5 0.5X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load dates from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, vec off, rebase off 13207 13339 116 7.6 132.1 1.0X
|
|
after 1582, vec off, rebase on 13408 13446 57 7.5 134.1 1.0X
|
|
after 1582, vec on, rebase off 3680 3712 39 27.2 36.8 3.6X
|
|
after 1582, vec on, rebase on 5229 5261 29 19.1 52.3 2.5X
|
|
before 1582, vec off, rebase off 13135 13164 25 7.6 131.4 1.0X
|
|
before 1582, vec off, rebase on 13946 14033 94 7.2 139.5 0.9X
|
|
before 1582, vec on, rebase off 3689 3726 49 27.1 36.9 3.6X
|
|
before 1582, vec on, rebase on 5679 5687 9 17.6 56.8 2.3X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save timestamps to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, noop 2720 2720 0 36.8 27.2 1.0X
|
|
before 1582, noop 2712 2712 0 36.9 27.1 1.0X
|
|
after 1582, rebase off 16626 16626 0 6.0 166.3 0.2X
|
|
after 1582, rebase on 85136 85136 0 1.2 851.4 0.0X
|
|
before 1582, rebase off 16855 16855 0 5.9 168.6 0.2X
|
|
before 1582, rebase on 106121 106121 0 0.9 1061.2 0.0X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load timestamps from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, vec off, rebase off 15198 15301 90 6.6 152.0 1.0X
|
|
after 1582, vec off, rebase on 55210 55370 140 1.8 552.1 0.3X
|
|
after 1582, vec on, rebase off 4859 4880 19 20.6 48.6 3.1X
|
|
after 1582, vec on, rebase on 44758 44824 85 2.2 447.6 0.3X
|
|
before 1582, vec off, rebase off 15206 15316 112 6.6 152.1 1.0X
|
|
before 1582, vec off, rebase on 60452 60588 222 1.7 604.5 0.3X
|
|
before 1582, vec on, rebase off 4892 4933 36 20.4 48.9 3.1X
|
|
before 1582, vec on, rebase on 46871 46950 82 2.1 468.7 0.3X
|
|
|
|
|
|
================================================================================================
|
|
Rebasing dates/timestamps in ORC datasource
|
|
================================================================================================
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save dates to ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, noop 9102 9102 0 11.0 91.0 1.0X
|
|
before 1582, noop 9099 9099 0 11.0 91.0 1.0X
|
|
after 1582 17652 17652 0 5.7 176.5 0.5X
|
|
before 1582 18284 18284 0 5.5 182.8 0.5X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load dates from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, vec off 25169 25215 48 4.0 251.7 1.0X
|
|
after 1582, vec on 3701 3717 16 27.0 37.0 6.8X
|
|
before 1582, vec off 26919 27045 182 3.7 269.2 0.9X
|
|
before 1582, vec on 4169 4192 31 24.0 41.7 6.0X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Save timestamps to ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, noop 2906 2906 0 34.4 29.1 1.0X
|
|
before 1582, noop 2863 2863 0 34.9 28.6 1.0X
|
|
after 1582 48858 48858 0 2.0 488.6 0.1X
|
|
before 1582 50945 50945 0 2.0 509.5 0.1X
|
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
Load timestamps from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
after 1582, vec off 40925 40955 26 2.4 409.2 1.0X
|
|
after 1582, vec on 31246 31404 164 3.2 312.5 1.3X
|
|
before 1582, vec off 44634 44680 40 2.2 446.3 0.9X
|
|
before 1582, vec on 35578 35834 282 2.8 355.8 1.2X
|
|
|
|
|