spark-instrumented-optimizer/sql/core/benchmarks/DateTimeRebaseBenchmark-results.txt
Max Gekk 91af87d34e [SPARK-31311][SQL][TESTS] Benchmark date-time rebasing in ORC datasource
### What changes were proposed in this pull request?
In the PR, I propose to add new benchmarks to `DateTimeRebaseBenchmark` for saving and loading dates/timestamps to/from ORC files. I extracted common code from the benchmark for Parquet datasource and place it to the methods `caseName()` and `getPath()`. Added benchmarks for ORC save/load dates before and after 1582-10-15 because an implementation may have different performance for dates before the Julian calendar cutover day, see #28067 as an example.

### Why are the changes needed?
To have the base line for future optimizations of `fromJavaDate()`/`toJavaDate()` and `toJavaTimestamp()`/`fromJavaTimestamp()` in `DateTimeUtils`. The methods are used while saving/loading dates/timestamps by ORC datasource.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By running the updated benchmark `DateTimeRebaseBenchmark` via the command:
```
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.DateTimeRebaseBenchmark"
```
in the environment:

| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 1.8.0_242-8u242/11.0.6+10 |

Closes #28076 from MaxGekk/rebase-benchmark-orc.

Lead-authored-by: Max Gekk <max.gekk@gmail.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-04-01 07:02:26 +00:00

95 lines
8.6 KiB
Plaintext

================================================================================================
Rebasing dates/timestamps in Parquet datasource
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save dates to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 9691 9691 0 10.3 96.9 1.0X
before 1582, noop 9024 9024 0 11.1 90.2 1.1X
after 1582, rebase off 21195 21195 0 4.7 211.9 0.5X
after 1582, rebase on 20045 20045 0 5.0 200.4 0.5X
before 1582, rebase off 20039 20039 0 5.0 200.4 0.5X
before 1582, rebase on 20451 20451 0 4.9 204.5 0.5X
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load dates from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off, rebase off 13207 13339 116 7.6 132.1 1.0X
after 1582, vec off, rebase on 13408 13446 57 7.5 134.1 1.0X
after 1582, vec on, rebase off 3680 3712 39 27.2 36.8 3.6X
after 1582, vec on, rebase on 5229 5261 29 19.1 52.3 2.5X
before 1582, vec off, rebase off 13135 13164 25 7.6 131.4 1.0X
before 1582, vec off, rebase on 13946 14033 94 7.2 139.5 0.9X
before 1582, vec on, rebase off 3689 3726 49 27.1 36.9 3.6X
before 1582, vec on, rebase on 5679 5687 9 17.6 56.8 2.3X
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save timestamps to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 2720 2720 0 36.8 27.2 1.0X
before 1582, noop 2712 2712 0 36.9 27.1 1.0X
after 1582, rebase off 16626 16626 0 6.0 166.3 0.2X
after 1582, rebase on 85136 85136 0 1.2 851.4 0.0X
before 1582, rebase off 16855 16855 0 5.9 168.6 0.2X
before 1582, rebase on 106121 106121 0 0.9 1061.2 0.0X
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load timestamps from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off, rebase off 15198 15301 90 6.6 152.0 1.0X
after 1582, vec off, rebase on 55210 55370 140 1.8 552.1 0.3X
after 1582, vec on, rebase off 4859 4880 19 20.6 48.6 3.1X
after 1582, vec on, rebase on 44758 44824 85 2.2 447.6 0.3X
before 1582, vec off, rebase off 15206 15316 112 6.6 152.1 1.0X
before 1582, vec off, rebase on 60452 60588 222 1.7 604.5 0.3X
before 1582, vec on, rebase off 4892 4933 36 20.4 48.9 3.1X
before 1582, vec on, rebase on 46871 46950 82 2.1 468.7 0.3X
================================================================================================
Rebasing dates/timestamps in ORC datasource
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save dates to ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 9102 9102 0 11.0 91.0 1.0X
before 1582, noop 9099 9099 0 11.0 91.0 1.0X
after 1582 17652 17652 0 5.7 176.5 0.5X
before 1582 18284 18284 0 5.5 182.8 0.5X
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load dates from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off 25169 25215 48 4.0 251.7 1.0X
after 1582, vec on 3701 3717 16 27.0 37.0 6.8X
before 1582, vec off 26919 27045 182 3.7 269.2 0.9X
before 1582, vec on 4169 4192 31 24.0 41.7 6.0X
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Save timestamps to ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 2906 2906 0 34.4 29.1 1.0X
before 1582, noop 2863 2863 0 34.9 28.6 1.0X
after 1582 48858 48858 0 2.0 488.6 0.1X
before 1582 50945 50945 0 2.0 509.5 0.1X
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Load timestamps from ORC: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off 40925 40955 26 2.4 409.2 1.0X
after 1582, vec on 31246 31404 164 3.2 312.5 1.3X
before 1582, vec off 44634 44680 40 2.2 446.3 0.9X
before 1582, vec on 35578 35834 282 2.8 355.8 1.2X