[SPARK-32043][SQL] Replace Decimal by Int op in `make_interval` and `make_timestamp`
### What changes were proposed in this pull request?
Replace Decimal by Int op in the `MakeInterval` & `MakeTimestamp` expression. For instance, `(secs * Decimal(MICROS_PER_SECOND)).toLong` can be replaced by the unscaled long because the former one already contains microseconds.
### Why are the changes needed?
To improve performance.
Before:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
...
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 94 99 4 10.7 93.8 38.8X
```
After:
```
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 76 92 15 13.1 76.5 48.1X
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- By existing test suites `IntervalExpressionsSuite`, `DateExpressionsSuite` and etc.
- Re-generate results of `MakeDateTimeBenchmark` in the environment:
| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 |
Closes #28886 from MaxGekk/make_interval-opt-decimal.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-23 07:45:12 -04:00
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
2019-10-03 11:58:25 -04:00
|
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
2019-09-17 18:09:16 -04:00
|
|
|
make_date(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
|
|
------------------------------------------------------------------------------------------------------------------------
|
[SPARK-32072][CORE][TESTS] Fix table formatting with benchmark results
### What changes were proposed in this pull request?
Set column width w/ benchmark names to maximum of either
1. 40 (before this PR) or
2. The length of benchmark name or
3. Maximum length of cases names
### Why are the changes needed?
To improve readability of benchmark results. For example, `MakeDateTimeBenchmark`.
Before:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
prepare make_timestamp() 3636 3673 38 0.3 3635.7 1.0X
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 94 99 4 10.7 93.8 38.8X
make_timestamp(2019, 1, 2, 3, 4, 60.000000) 68 80 13 14.6 68.3 53.2X
make_timestamp(2019, 12, 31, 23, 59, 60.00) 65 79 19 15.3 65.3 55.7X
make_timestamp(*, *, *, 3, 4, 50.123456) 271 280 14 3.7 270.7 13.4X
```
After:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
prepare make_timestamp() 3694 3745 82 0.3 3694.0 1.0X
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 82 90 9 12.2 82.3 44.9X
make_timestamp(2019, 1, 2, 3, 4, 60.000000) 72 77 5 13.9 71.9 51.4X
make_timestamp(2019, 12, 31, 23, 59, 60.00) 67 71 5 15.0 66.8 55.3X
make_timestamp(*, *, *, 3, 4, 50.123456) 273 289 14 3.7 273.2 13.5X
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By re-generating benchmark results for `MakeDateTimeBenchmark`:
```
$ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.MakeDateTimeBenchmark"
```
in the environment:
| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 |
Closes #28906 from MaxGekk/benchmark-table-formatting.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-24 00:43:53 -04:00
|
|
|
prepare make_date() 3309 3429 110 30.2 33.1 1.0X
|
|
|
|
make_date(2019, 9, 16) 2336 2359 23 42.8 23.4 1.4X
|
|
|
|
make_date(*, *, *) 4588 4618 27 21.8 45.9 0.7X
|
2019-09-17 18:09:16 -04:00
|
|
|
|
[SPARK-32043][SQL] Replace Decimal by Int op in `make_interval` and `make_timestamp`
### What changes were proposed in this pull request?
Replace Decimal by Int op in the `MakeInterval` & `MakeTimestamp` expression. For instance, `(secs * Decimal(MICROS_PER_SECOND)).toLong` can be replaced by the unscaled long because the former one already contains microseconds.
### Why are the changes needed?
To improve performance.
Before:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
...
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 94 99 4 10.7 93.8 38.8X
```
After:
```
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 76 92 15 13.1 76.5 48.1X
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- By existing test suites `IntervalExpressionsSuite`, `DateExpressionsSuite` and etc.
- Re-generate results of `MakeDateTimeBenchmark` in the environment:
| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 |
Closes #28886 from MaxGekk/make_interval-opt-decimal.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-23 07:45:12 -04:00
|
|
|
OpenJDK 64-Bit Server VM 1.8.0_252-8u252-b09-1~18.04-b09 on Linux 4.15.0-1063-aws
|
2019-10-03 11:58:25 -04:00
|
|
|
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
[SPARK-32072][CORE][TESTS] Fix table formatting with benchmark results
### What changes were proposed in this pull request?
Set column width w/ benchmark names to maximum of either
1. 40 (before this PR) or
2. The length of benchmark name or
3. Maximum length of cases names
### Why are the changes needed?
To improve readability of benchmark results. For example, `MakeDateTimeBenchmark`.
Before:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
prepare make_timestamp() 3636 3673 38 0.3 3635.7 1.0X
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 94 99 4 10.7 93.8 38.8X
make_timestamp(2019, 1, 2, 3, 4, 60.000000) 68 80 13 14.6 68.3 53.2X
make_timestamp(2019, 12, 31, 23, 59, 60.00) 65 79 19 15.3 65.3 55.7X
make_timestamp(*, *, *, 3, 4, 50.123456) 271 280 14 3.7 270.7 13.4X
```
After:
```
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
prepare make_timestamp() 3694 3745 82 0.3 3694.0 1.0X
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 82 90 9 12.2 82.3 44.9X
make_timestamp(2019, 1, 2, 3, 4, 60.000000) 72 77 5 13.9 71.9 51.4X
make_timestamp(2019, 12, 31, 23, 59, 60.00) 67 71 5 15.0 66.8 55.3X
make_timestamp(*, *, *, 3, 4, 50.123456) 273 289 14 3.7 273.2 13.5X
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By re-generating benchmark results for `MakeDateTimeBenchmark`:
```
$ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.MakeDateTimeBenchmark"
```
in the environment:
| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10 |
Closes #28906 from MaxGekk/benchmark-table-formatting.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-06-24 00:43:53 -04:00
|
|
|
make_timestamp(): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
|
|
---------------------------------------------------------------------------------------------------------------------------
|
|
|
|
prepare make_timestamp() 3651 3697 58 0.3 3651.4 1.0X
|
|
|
|
make_timestamp(2019, 1, 2, 3, 4, 50.123456) 89 99 10 11.3 88.6 41.2X
|
|
|
|
make_timestamp(2019, 1, 2, 3, 4, 60.000000) 72 73 1 13.9 72.1 50.6X
|
|
|
|
make_timestamp(2019, 12, 31, 23, 59, 60.00) 66 68 3 15.2 65.8 55.5X
|
|
|
|
make_timestamp(*, *, *, 3, 4, 50.123456) 265 272 6 3.8 265.1 13.8X
|
|
|
|
make_timestamp(*, *, *, *, *, 0) 259 266 6 3.9 259.1 14.1X
|
|
|
|
make_timestamp(*, *, *, *, *, 60.0) 271 278 9 3.7 271.2 13.5X
|
|
|
|
make_timestamp(2019, 1, 2, *, *, *) 3838 3850 12 0.3 3837.7 1.0X
|
|
|
|
make_timestamp(*, *, *, *, *, *) 3854 3877 20 0.3 3853.8 0.9X
|
2019-09-17 18:09:16 -04:00
|
|
|
|