### What changes were proposed in this pull request?
With the latest string to literal optimization https://github.com/apache/spark/pull/26256, some interval strings can not be cast when there are some spaces between signs and unit values. After state `PARSE_SIGN`, it directly goes to `PARSE_UNIT_VALUE` when takes a space character as the end. So when there are some white spaces come before the real unit value, it fails to parse, we should add a new state like `TRIM_VALUE` to trim all these spaces.
How to re-produce, which aim the revisions since https://github.com/apache/spark/pull/26256 is merged
```sql
select cast(v as interval) from values ('+ 1 second') t(v);
select cast(v as interval) from values ('- 1 second') t(v);
```
### Why are the changes needed?
bug fix
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
1. ut
2. new benchmark test
Closes#26449 from yaooqinn/SPARK-29605.
Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
In the PR, I propose new function `stringToInterval()` in `IntervalUtils` for converting `UTF8String` to `CalendarInterval`. The function is used in casting a `STRING` column to an `INTERVAL` column.
### Why are the changes needed?
The proposed implementation is ~10 times faster. For example, parsing 9 interval units on JDK 8:
Before:
```
9 units w/ interval 14004 14125 116 0.1 14003.6 0.0X
9 units w/o interval 13785 14056 290 0.1 13784.9 0.0X
```
After:
```
9 units w/ interval 1343 1344 1 0.7 1343.0 0.3X
9 units w/o interval 1345 1349 8 0.7 1344.6 0.3X
```
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
- By new tests for `stringToInterval` in `IntervalUtilsSuite`
- By existing tests
Closes#26256 from MaxGekk/string-to-interval.
Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Only use antlr4 to parse the interval string, and remove the duplicated parsing logic from `CalendarInterval`.
### Why are the changes needed?
Simplify the code and fix inconsistent behaviors.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Pass the Jenkins with the updated test cases.
Closes#26190 from cloud-fan/parser.
Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request?
This is a follow-up of https://github.com/apache/spark/pull/26189 to regenerate the result on EC2.
### Why are the changes needed?
This will be used for the other PR reviews.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A.
Closes#26233 from dongjoon-hyun/SPARK-29533.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
### What changes were proposed in this pull request?
Added new benchmark `IntervalBenchmark` to measure performance of interval related functions. In the PR, I added benchmarks for casting strings to interval. In particular, interval strings with `interval` prefix and without it because there is special code for this da576a737c/common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java (L100-L103) . And also I added benchmarks for different number of units in interval strings, for example 1 unit is `interval 10 years`, 2 units w/o interval is `10 years 5 months`, and etc.
### Why are the changes needed?
- To find out current performance issues in casting to intervals
- The benchmark can be used while refactoring/re-implementing `CalendarInterval.fromString()` or `CalendarInterval.fromCaseInsensitiveString()`.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
By running the benchmark via the command:
```shell
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.IntervalBenchmark"
```
Closes#26189 from MaxGekk/interval-from-string-benchmark.
Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>