[SPARK-32043][SQL] Replace Decimal by Int op in `make_interval` and `make_timestamp`
### What changes were proposed in this pull request?
Replace Decimal by Int op in the `MakeInterval` & `MakeTimestamp` expression. For instance, `(secs * Decimal(MICROS_PER_SECOND)).toLong` can be replaced by the unscaled long because the former one already contains microseconds.
### Why are the changes needed?
To improve performance.
Before:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
...
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 94 99 4 10.7 93.8 38.8X
```
After:
```
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 76 92 15 13.1 76.5 48.1X
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- By existing test suites `IntervalExpressionsSuite`, `DateExpressionsSuite` and etc.
- Re-generate results of `MakeDateTimeBenchmark` in the environment:
| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 |
Closes #28886 from MaxGekk/make_interval-opt-decimal.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-23 07:45:12 -04:00
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
2019-10-03 11:58:25 -04:00
|
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
|
|
|
make_date(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
|
|
------------------------------------------------------------------------------------------------------------------------
|
[SPARK-32072][CORE][TESTS] Fix table formatting with benchmark results
### What changes were proposed in this pull request?
Set column width w/ benchmark names to maximum of either
1. 40 (before this PR) or
2. The length of benchmark name or
3. Maximum length of cases names
### Why are the changes needed?
To improve readability of benchmark results. For example, `MakeDateTimeBenchmark`.
Before:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
prepare make_timestamp() 3636 3673 38 0.3 3635.7 1.0X
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 94 99 4 10.7 93.8 38.8X
make_timestamp(2019, 1, 2, 3, 4, 60.000000) 68 80 13 14.6 68.3 53.2X
make_timestamp(2019, 12, 31, 23, 59, 60.00) 65 79 19 15.3 65.3 55.7X
make_timestamp(*, *, *, 3, 4, 50.123456) 271 280 14 3.7 270.7 13.4X
```
After:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
prepare make_timestamp() 3694 3745 82 0.3 3694.0 1.0X
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 82 90 9 12.2 82.3 44.9X
make_timestamp(2019, 1, 2, 3, 4, 60.000000) 72 77 5 13.9 71.9 51.4X
make_timestamp(2019, 12, 31, 23, 59, 60.00) 67 71 5 15.0 66.8 55.3X
make_timestamp(*, *, *, 3, 4, 50.123456) 273 289 14 3.7 273.2 13.5X
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By re-generating benchmark results for `MakeDateTimeBenchmark`:
```
$ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.MakeDateTimeBenchmark"
```
in the environment:
| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 |
Closes #28906 from MaxGekk/benchmark-table-formatting.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-24 00:43:53 -04:00
|
|
|
prepare make_date() 3214 3344 209 31.1 32.1 1.0X
|
|
|
|
make_date(2019, 9, 16) 2342 2348 6 42.7 23.4 1.4X
|
|
|
|
make_date(*, *, *) 4485 4533 56 22.3 44.8 0.7X
|
2019-10-03 11:58:25 -04:00
|
|
|
|
[SPARK-32043][SQL] Replace Decimal by Int op in `make_interval` and `make_timestamp`
### What changes were proposed in this pull request?
Replace Decimal by Int op in the `MakeInterval` & `MakeTimestamp` expression. For instance, `(secs * Decimal(MICROS_PER_SECOND)).toLong` can be replaced by the unscaled long because the former one already contains microseconds.
### Why are the changes needed?
To improve performance.
Before:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
...
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 94 99 4 10.7 93.8 38.8X
```
After:
```
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 76 92 15 13.1 76.5 48.1X
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- By existing test suites `IntervalExpressionsSuite`, `DateExpressionsSuite` and etc.
- Re-generate results of `MakeDateTimeBenchmark` in the environment:
| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 |
Closes #28886 from MaxGekk/make_interval-opt-decimal.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-23 07:45:12 -04:00
|
|
|
OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
|
2019-10-03 11:58:25 -04:00
|
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
[SPARK-32072][CORE][TESTS] Fix table formatting with benchmark results
### What changes were proposed in this pull request?
Set column width w/ benchmark names to maximum of either
1. 40 (before this PR) or
2. The length of benchmark name or
3. Maximum length of cases names
### Why are the changes needed?
To improve readability of benchmark results. For example, `MakeDateTimeBenchmark`.
Before:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
prepare make_timestamp() 3636 3673 38 0.3 3635.7 1.0X
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 94 99 4 10.7 93.8 38.8X
make_timestamp(2019, 1, 2, 3, 4, 60.000000) 68 80 13 14.6 68.3 53.2X
make_timestamp(2019, 12, 31, 23, 59, 60.00) 65 79 19 15.3 65.3 55.7X
make_timestamp(*, *, *, 3, 4, 50.123456) 271 280 14 3.7 270.7 13.4X
```
After:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
prepare make_timestamp() 3694 3745 82 0.3 3694.0 1.0X
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 82 90 9 12.2 82.3 44.9X
make_timestamp(2019, 1, 2, 3, 4, 60.000000) 72 77 5 13.9 71.9 51.4X
make_timestamp(2019, 12, 31, 23, 59, 60.00) 67 71 5 15.0 66.8 55.3X
make_timestamp(*, *, *, 3, 4, 50.123456) 273 289 14 3.7 273.2 13.5X
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By re-generating benchmark results for `MakeDateTimeBenchmark`:
```
$ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.MakeDateTimeBenchmark"
```
in the environment:
| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 |
Closes #28906 from MaxGekk/benchmark-table-formatting.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-24 00:43:53 -04:00
|
|
|
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
|
|
---------------------------------------------------------------------------------------------------------------------------
|
|
|
|
prepare make_timestamp() 3744 3775 35 0.3 3744.1 1.0X
|
|
|
|
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 82 91 9 12.2 82.3 45.5X
|
|
|
|
make_timestamp(2019, 1, 2, 3, 4, 60.000000) 81 89 7 12.4 81.0 46.2X
|
|
|
|
make_timestamp(2019, 12, 31, 23, 59, 60.00) 70 80 9 14.3 69.9 53.5X
|
|
|
|
make_timestamp(*, *, *, 3, 4, 50.123456) 308 314 7 3.2 308.1 12.2X
|
|
|
|
make_timestamp(*, *, *, *, *, 0) 302 316 14 3.3 301.9 12.4X
|
|
|
|
make_timestamp(*, *, *, *, *, 60.0) 290 296 6 3.4 290.4 12.9X
|
|
|
|
make_timestamp(2019, 1, 2, *, *, *) 3888 3902 15 0.3 3888.1 1.0X
|
|
|
|
make_timestamp(*, *, *, *, *, *) 3902 3908 8 0.3 3901.6 1.0X
|
2019-10-03 11:58:25 -04:00
|
|
|
|