Commit graph

158 commits

Author SHA1 Message Date
Gengliang Wang 1bad04d028 Preparing development version 3.2.1-SNAPSHOT 2021-08-31 17:04:14 +00:00
Gengliang Wang 03f5d23e96 Preparing Spark release v3.2.0-rc2 2021-08-31 17:04:08 +00:00
Gengliang Wang 69be513c5e Preparing development version 3.2.1-SNAPSHOT 2021-08-20 12:40:47 +00:00
Gengliang Wang 6bb3523d8e Preparing Spark release v3.2.0-rc1 2021-08-20 12:40:40 +00:00
Gengliang Wang fafdc1482b Revert "Preparing Spark release v3.2.0-rc1"
This reverts commit 8e58fafb05.
2021-08-20 20:07:02 +08:00
Gengliang Wang c829ed53ff Revert "Preparing development version 3.2.1-SNAPSHOT"
This reverts commit 4f1d21571d.
2021-08-20 20:07:01 +08:00
Gengliang Wang 4f1d21571d Preparing development version 3.2.1-SNAPSHOT 2021-08-19 14:08:32 +00:00
Gengliang Wang 8e58fafb05 Preparing Spark release v3.2.0-rc1 2021-08-19 14:08:26 +00:00
Kousuke Saruta 1f594d902a [SPARK-36081][SPARK-36066][SQL] Update the document about the behavior change of trimming characters for cast
### What changes were proposed in this pull request?

This PR modifies comment for `UTF8String.trimAll` and`sql-migration-guide.mld`.
The comment for `UTF8String.trimAll` says like as follows.
```
Trims whitespaces ({literal <=} ASCII 32) from both ends of this string.
```
Similarly, `sql-migration-guide.md` mentions about the behavior of `cast` like as follows.
```
In Spark 3.0, when casting string value to integral types(tinyint, smallint, int and bigint),
datetime types(date, timestamp and interval) and boolean type,
the leading and trailing whitespaces (<= ASCII 32) will be trimmed before converted to these type values,
for example, `cast(' 1\t' as int)` results `1`, `cast(' 1\t' as boolean)` results `true`,
`cast('2019-10-10\t as date)` results the date value `2019-10-10`.
In Spark version 2.4 and below, when casting string to integrals and booleans,
it does not trim the whitespaces from both ends; the foregoing results is `null`,
while to datetimes, only the trailing spaces (= ASCII 32) are removed.
```

But SPARK-32559 (#29375) changed the behavior and only whitespace ASCII characters will be trimmed since Spark 3.0.1.

### Why are the changes needed?

To follow the previous change.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Confirmed the document built by the following command.
```
SKIP_API=1 bundle exec jekyll build
```

Closes #33287 from sarutak/fix-utf8string-trim-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 57a4f310df)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-07-13 20:29:05 +08:00
ulysses-you e70b0f81b3 [SPARK-35041][SQL] Revise the overflow in UTF8String
### What changes were proposed in this pull request?

Add overflow check before do `new byte[]`.

### Why are the changes needed?

Avoid overflow in extreme case.

### Does this PR introduce _any_ user-facing change?

Maybe yes, the error msg changed if overflow.

### How was this patch tested?

Pass CI.

Closes #32142 from ulysses-you/SPARK-35041.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-13 14:34:28 +03:00
ulysses-you 1be1012497 [SPARK-35005][SQL] Improve error msg if UTF8String concatWs length overflow
### What changes were proposed in this pull request?

Add check if the byte length over `int`.

### Why are the changes needed?

We encounter a very extreme case with expression `concat_ws`, and the error msg is
```
Caused by: java.lang.NegativeArraySizeException
	at org.apache.spark.unsafe.types.UTF8String.concatWs
```
Seems the `UTF8String.concat` has already done the length check at [#21064](https://github.com/apache/spark/pull/21064), so it's better to add in `concatWs`.

### Does this PR introduce _any_ user-facing change?

Yes

### How was this patch tested?

It's too heavy to add the test.

Closes #32106 from ulysses-you/SPARK-35005.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
2021-04-12 14:32:15 +03:00
Kent Yao d1177b5230 [SPARK-34192][SQL] Move char padding to write side and remove length check on read side too
### What changes were proposed in this pull request?

On the read-side, the char length check and padding bring issues to CBO and predicate pushdown and other issues to the catalyst.

This PR reverts 6da5cdf1db  that added read side length check) so that we only do length check for the write side, and data sources/vendors are responsible to enforce the char/varchar constraints for data import operations like ADD PARTITION. It doesn't make sense for Spark to report errors on the read-side if the data is already dirty.

This PR also moves the char padding to the write-side, so that it 1) avoids read side issues like CBO and filter pushdown. 2) the data source can preserve char type semantic better even if it's read by systems other than Spark.

### Why are the changes needed?

fix perf regression when tables have char/varchar type columns

closes #31278
### Does this PR introduce _any_ user-facing change?

yes, spark will not raise error for oversized char/varchar values in read side
### How was this patch tested?

modified ut

the dropped read side benchmark
```
================================================================================================
Char Varchar Read Side Perf w/o Tailing Spaces
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 20:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 20                         1564           1573           9         63.9          15.6       1.0X
read char with length 20                           1532           1551          18         65.3          15.3       1.0X
read varchar with length 20                        1520           1531          13         65.8          15.2       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 40:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 40                         1573           1613          41         63.6          15.7       1.0X
read char with length 40                           1575           1577           2         63.5          15.7       1.0X
read varchar with length 40                        1568           1576          11         63.8          15.7       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 60:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 60                         1526           1540          23         65.5          15.3       1.0X
read char with length 60                           1514           1539          23         66.0          15.1       1.0X
read varchar with length 60                        1486           1497          10         67.3          14.9       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 80:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 80                         1531           1542          19         65.3          15.3       1.0X
read char with length 80                           1514           1529          15         66.0          15.1       1.0X
read varchar with length 80                        1524           1565          42         65.6          15.2       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 100:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 100                        1597           1623          25         62.6          16.0       1.0X
read char with length 100                          1499           1512          16         66.7          15.0       1.1X
read varchar with length 100                       1517           1524           8         65.9          15.2       1.1X

================================================================================================
Char Varchar Read Side Perf w/ Tailing Spaces
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 20:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 20                         1524           1526           1         65.6          15.2       1.0X
read char with length 20                           1532           1537           9         65.3          15.3       1.0X
read varchar with length 20                        1520           1532          15         65.8          15.2       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 40:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 40                         1556           1580          32         64.3          15.6       1.0X
read char with length 40                           1600           1611          17         62.5          16.0       1.0X
read varchar with length 40                        1648           1716          88         60.7          16.5       0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 60:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 60                         1504           1524          20         66.5          15.0       1.0X
read char with length 60                           1509           1512           3         66.2          15.1       1.0X
read varchar with length 60                        1519           1535          21         65.8          15.2       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 80:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 80                         1640           1652          17         61.0          16.4       1.0X
read char with length 80                           1625           1666          35         61.5          16.3       1.0X
read varchar with length 80                        1590           1605          13         62.9          15.9       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 100:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 100                        1622           1628           5         61.6          16.2       1.0X
read char with length 100                          1614           1646          30         62.0          16.1       1.0X
read varchar with length 100                       1594           1606          11         62.7          15.9       1.0X
```

Closes #31281 from yaooqinn/SPARK-34192.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-01-26 02:08:35 +08:00
Kousuke Saruta 116f4cab6b [SPARK-34094][SQL] Extends StringTranslate to support unicode characters whose code point >= U+10000
### What changes were proposed in this pull request?

This PR extends `StringTranslate` to support unicode characters whose code point >= `U+10000`.

### Why are the changes needed?

To make it work with wide variety of characters.

### Does this PR introduce _any_ user-facing change?

Yes. Users can use `StringTranslate` with unicode characters whose code point >= `U+10000`.

### How was this patch tested?

New assertion added to the existing test.

Closes #31164 from sarutak/extends-translate.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-01-21 08:15:55 -06:00
Kent Yao d640631e36 [SPARK-34164][SQL] Improve write side varchar check to visit only last few tailing spaces
### What changes were proposed in this pull request?

For varchar(N), we currently trim all spaces first to check whether the remained length exceeds, it not necessary to visit them all but at most to those after N.

### Why are the changes needed?

improve varchar performance for write side
### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

benchmark and existing ut

Closes #31253 from yaooqinn/SPARK-34164.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-01-21 05:30:57 +00:00
Dongjoon Hyun de9818f043
[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT
### What changes were proposed in this pull request?

This PR aims to update `master` branch version to 3.2.0-SNAPSHOT.

### Why are the changes needed?

Start to prepare Apache Spark 3.2.0.

### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

Pass the CIs.

Closes #30606 from dongjoon-hyun/SPARK-3.2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-12-04 14:10:42 -08:00
Josh Soref 13fd272cd3 Spelling r common dev mlib external project streaming resource managers python
### What changes were proposed in this pull request?

This PR intends to fix typos in the sub-modules:
* `R`
* `common`
* `dev`
* `mlib`
* `external`
* `project`
* `streaming`
* `resource-managers`
* `python`

Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618

NOTE: The misspellings have been reported at 706a726f87 (commitcomment-44064356)

### Why are the changes needed?

Misspelled words make it harder to read / understand content.

### Does this PR introduce _any_ user-facing change?

There are various fixes to documentation, etc...

### How was this patch tested?

No testing was performed

Closes #30402 from jsoref/spelling-R_common_dev_mlib_external_project_streaming_resource-managers_python.

Authored-by: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-11-27 10:22:45 -06:00
Denis Pyshev 1b0875b692 [SPARK-33115][BUILD][DOCS] Fix javadoc errors in kvstore and unsafe modules
### What changes were proposed in this pull request?

Fix Javadoc generation errors in `kvstore` and `unsafe` modules according to error message hints.

### Why are the changes needed?

Fixes `doc` task failures which prevented other tasks successful executions (eg `publishLocal` task depends on `doc` task).

### Does this PR introduce _any_ user-facing change?

No.
Meaning of text in Javadoc is stayed the same.

### How was this patch tested?

Run `build/sbt kvstore/Compile/doc`, `build/sbt unsafe/Compile/doc` and `build/sbt doc` without errors.

Closes #30007 from gemelen/feature/doc-task-fix.

Authored-by: Denis Pyshev <git@gemelen.net>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-10-13 21:37:26 +09:00
Michael Munday 383bb4af00 [SPARK-32892][CORE][SQL] Fix hash functions on big-endian platforms
MurmurHash3 and xxHash64 interpret sequences of bytes as integers
encoded in little-endian byte order. This requires a byte reversal
on big endian platforms.

I've left the hashInt and hashLong functions as-is for now. My
interpretation of these functions is that they perform the hash on
the integer value as if it were serialized in little-endian byte
order. Therefore no byte reversal is necessary.

### What changes were proposed in this pull request?
Modify hash functions to produce correct results on big-endian platforms.

### Why are the changes needed?
Hash functions produce incorrect results on big-endian platforms which, amongst other potential issues, causes test failures.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests run on the IBM Z (s390x) platform which uses a big-endian byte order.

Closes #29762 from mundaym/fix-hashes.

Authored-by: Michael Munday <mike.munday@ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-09-23 12:36:46 -05:00
wangguangxin.cn 9a35b93c8a [SPARK-32559][SQL] Fix the trim logic in UTF8String.toInt/toLong did't handle non-ASCII characters correctly
### What changes were proposed in this pull request?
The trim logic in Cast expression introduced in https://github.com/apache/spark/pull/26622 trim non-ASCII characters unexpectly.

Before this patch
![image](https://user-images.githubusercontent.com/1312321/89513154-caad9b80-d806-11ea-9ebe-17c9e7d1b5b3.png)

After this patch
![image](https://user-images.githubusercontent.com/1312321/89513196-d731f400-d806-11ea-959c-6a7dc29dcd49.png)

### Why are the changes needed?
The behavior described above doesn't make sense, and also doesn't consistent with the behavior when cast a string to double/float, as well as doesn't consistent with the behavior of Hive

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added more UT

Closes #29375 from WangGuangxin/cast-bugfix.

Authored-by: wangguangxin.cn <wangguangxin.cn@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-08-07 05:00:33 +00:00
Sean Owen be2eca22e9 [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
### What changes were proposed in this pull request?

Updates to scalatest 3.2.0. Though it looks large, it is 99% changes to the new location of scalatest classes.

### Why are the changes needed?

3.2.0+ has a fix that is required for Scala 2.13.3+ compatibility.

### Does this PR introduce _any_ user-facing change?

No, only affects tests.

### How was this patch tested?

Existing tests.

Closes #29196 from srowen/SPARK-32398.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-23 16:20:17 -07:00
Yuanjian Li 6484c14c57 [SPARK-32115][SQL] Fix SUBSTRING to handle integer overflows
### What changes were proposed in this pull request?
Bug fix for overflow case in `UTF8String.substringSQL`.

### Why are the changes needed?
SQL query `SELECT SUBSTRING("abc", -1207959552, -1207959552)` incorrectly returns` "abc"` against expected output of `""`. For query `SUBSTRING("abc", -100, -100)`, we'll get the right output of `""`.

### Does this PR introduce _any_ user-facing change?
Yes, bug fix for the overflow case.

### How was this patch tested?
New UT.

Closes #28937 from xuanyuanking/SPARK-32115.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-06-28 12:22:44 -07:00
Kent Yao 697083c051 [SPARK-31469][SQL] Make extract interval field ANSI compliance
### What changes were proposed in this pull request?

Currently, we can extract `millennium/century/decade/year/quarter/month/week/day/hour/minute/second(with fractions)//millisecond/microseconds` and `epoch` from interval values

While getting the `millennium/century/decade/year`, it means how many the interval `months` part can be converted to that unit-value. The content of `millennium/century/decade` will overlap `year` and each other.

While getting `month/day` and so on, it means the integral remainder of the previous unit. Here all the units including `year` are individual.

So while extracting `year`, `month`, `day`, `hour`, `minute`, `second`, which are ANSI primary datetime units, the semantic is `extracting`, but others might refer to `transforming`.

While getting epoch we have treat month as 30 days which varies the natural Calendar rules we use.

To avoid ambiguity, I suggest we should only support those extract field defined ANSI with their abbreviations.

### Why are the changes needed?

Extracting `millennium`, `century` etc does not obey the meaning of extracting, and they are not so useful and worth maintaining.

The `extract` is ANSI standard expression and `date_part` is its pg-specific alias function. The current support extract-fields are fully bought from PostgreSQL.

With a look at other systems like Presto/Hive, they don't support those ambiguous fields too.

e.g. Hive 2.2.x also take it from PostgreSQL but without introducing those ambiguous fields https://issues.apache.org/jira/secure/attachment/12828349/HIVE-14579

e.g. presto

```sql
presto> select extract(quater from interval '10-0' year to month);
Query 20200417_094723_00020_m8xq4 failed: line 1:8: Invalid EXTRACT field: quater
select extract(quater from interval '10-0' year to month)

presto> select extract(decade from interval '10-0' year to month);
Query 20200417_094737_00021_m8xq4 failed: line 1:8: Invalid EXTRACT field: decade
select extract(decade from interval '10-0' year to month)

```

### Does this PR introduce any user-facing change?

Yes, as we already have previews versions, this PR will remove support for extracting `millennium/century/decade/quarter/week/millisecond/microseconds` and `epoch` from intervals with `date_part` function

### How was this patch tested?

rm some used tests

Closes #28242 from yaooqinn/SPARK-31469.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-04-17 13:59:02 +00:00
yi.wu 40f9dbb628 [SPARK-31425][SQL][CORE] UnsafeKVExternalSorter/VariableLengthRowBasedKeyValueBatch should also respect UnsafeAlignedOffset
### What changes were proposed in this pull request?

Make `UnsafeKVExternalSorter` / `VariableLengthRowBasedKeyValueBatch ` also respect `UnsafeAlignedOffset` when reading the record and update some out of date comemnts.

### Why are the changes needed?

Since `BytesToBytesMap` respects `UnsafeAlignedOffset` when writing the record, `UnsafeKVExternalSorter` should also respect `UnsafeAlignedOffset` when reading the record from `BytesToBytesMap` otherwise it will causes data correctness issue.

Unlike `UnsafeKVExternalSorter` may reading records from `BytesToBytesMap`, `VariableLengthRowBasedKeyValueBatch` writes and reads records by itself. Thus, similar to #22053 and [comment](https://github.com/apache/spark/pull/22053#issuecomment-411975239) there, fix for `VariableLengthRowBasedKeyValueBatch` more likely an improvement for the support of SPARC platform.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually tested `HashAggregationQueryWithControlledFallbackSuite` with `UAO_SIZE=8`  to simulate SPARC platform. And tests only pass with this fix.

Closes #28195 from Ngone51/fix_uao.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-04-17 04:48:27 +00:00
Wenchen Fan ac262cb272 [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction
### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/26933

Fraction string like "1.23" is definitely not a valid integral format and we should fail to do the cast under the ANSI mode.

### Why are the changes needed?

correct the ANSI cast behavior from string to integral

### Does this PR introduce any user-facing change?

Yes under ANSI mode, but ANSI mode is off by default.

### How was this patch tested?

new test

Closes #27957 from cloud-fan/ansi.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-03-20 00:52:09 +09:00
gatorsmile 28b8713036 [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT
### What changes were proposed in this pull request?
This patch is to bump the master branch version to 3.1.0-SNAPSHOT.

### Why are the changes needed?
N/A

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
N/A

Closes #27698 from gatorsmile/updateVersion.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-25 19:44:31 -08:00
HyukjinKwon 6f4703e22e [SPARK-30690][DOCS][BUILD] Add CalendarInterval into API documentation
### What changes were proposed in this pull request?

We should also expose it in documentation as we marked it as unstable API as of SPARK-30547
Note that, seems Javadoc -> Scaladoc doesn't work but this PR does not target to fix.

### Why are the changes needed?

To show the documentation of API.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Manually built the docs via `jykill serve` under `docs` directory:

![Screen Shot 2020-01-31 at 4 04 15 PM](https://user-images.githubusercontent.com/6477701/73519315-12143300-4444-11ea-9260-070c9f672dde.png)

Closes #27412 from HyukjinKwon/SPARK-30547.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-01-31 22:50:01 +09:00
Kent Yao af705421db [SPARK-30593][SQL] Revert interval ISO/ANSI SQL Standard output since we decide not to follow ANSI and no round trip
### What changes were proposed in this pull request?

This revert https://github.com/apache/spark/pull/26418, file a new ticket under  https://issues.apache.org/jira/browse/SPARK-30546 for better tracking interval behavior
### Why are the changes needed?

Revert interval ISO/ANSI SQL Standard output since we decide not to follow ANSI and there is no round trip

### Does this PR introduce any user-facing change?

no, not released yet

### How was this patch tested?

existing uts

Closes #27304 from yaooqinn/SPARK-30593.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-01-21 20:51:10 +08:00
Kent Yao 730388b369 [SPARK-30547][SQL][FOLLOWUP] Update since anotation for CalendarInterval class
### What changes were proposed in this pull request?
Mark `CalendarInterval` class with `since 3.0.0`.
### Why are the changes needed?

https://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#since

This class is the first time going to the public, the annotation is the first time to add, and we don't want people to get confused and try to use it 2.4.x.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

no

Closes #27299 from yaooqinn/SPARK-30547-F.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-01-21 20:35:47 +08:00
Kent Yao 4806cc5bd1 [SPARK-30547][SQL] Add unstable annotation to the CalendarInterval class
### What changes were proposed in this pull request?

`CalendarInterval` is maintained as a private class but might be used in a public way by users
e.g.

```scala
scala> spark.udf.register("getIntervalMonth", (_:org.apache.spark.unsafe.types.CalendarInterval).months)

scala> sql("select interval 2 month 1 day a").selectExpr("getIntervalMonth(a)").show
+-------------------+
|getIntervalMonth(a)|
+-------------------+
|                  2|
+-------------------+
```

And it exists since 1.5.0, now we go to the 3.x era,may be it's time to make it public

### Why are the changes needed?

make the interval more future-proofing

### Does this PR introduce any user-facing change?

doc change

### How was this patch tested?

add ut.

Closes #27258 from yaooqinn/SPARK-30547.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-01-20 12:17:37 +08:00
Kent Yao 17857f9b8b [SPARK-30551][SQL] Disable comparison for interval type
### What changes were proposed in this pull request?

As we are not going to follow ANSI to implement year-month and day-time interval types, it is weird to compare the year-month part to the day-time part for our current implementation of interval type now.

Additionally, the current ordering logic comes from PostgreSQL where the implementation of the interval is messy. And we are not aiming PostgreSQL compliance at all.

THIS PR will revert https://github.com/apache/spark/pull/26681 and https://github.com/apache/spark/pull/26337

### Why are the changes needed?

make interval type more future-proofing

### Does this PR introduce any user-facing change?

there are new in 3.0, so no

### How was this patch tested?

existing uts shall work

Closes #27262 from yaooqinn/SPARK-30551.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-01-19 15:27:51 +08:00
root1 e0efd213eb [SPARK-30292][SQL] Throw Exception when invalid string is cast to numeric type in ANSI mode
### What changes were proposed in this pull request?
If spark.sql.ansi.enabled is set,
throw exception when cast to any numeric type do not follow the ANSI SQL standards.

### Why are the changes needed?
ANSI SQL standards do not allow invalid strings to get casted into numeric types and throw exception for that. Currently spark sql gives NULL in such cases.

Before:
`select cast('str' as decimal)  => NULL`

After :
`select cast('str' as decimal) => invalid input syntax for type numeric: str`

These results are after setting `spark.sql.ansi.enabled=true`

### Does this PR introduce any user-facing change?
Yes. Now when ansi mode is on users will get arithmetic exception for invalid strings.

### How was this patch tested?
Unit Tests Added.

Closes #26933 from iRakson/castDecimalANSI.

Lead-authored-by: root1 <raksonrakesh@gmail.com>
Co-authored-by: iRakson <raksonrakesh@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-01-14 17:03:10 +08:00
Yuming Wang 696288f623 [INFRA] Reverts commit 56dcd79 and c216ef1
### What changes were proposed in this pull request?
1. Revert "Preparing development version 3.0.1-SNAPSHOT": 56dcd79

2. Revert "Preparing Spark release v3.0.0-preview2-rc2": c216ef1

### Why are the changes needed?
Shouldn't change master.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
manual test:
https://github.com/apache/spark/compare/5de5e46..wangyum:revert-master

Closes #26915 from wangyum/revert-master.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
2019-12-16 19:57:44 -07:00
Yuming Wang 56dcd79992 Preparing development version 3.0.1-SNAPSHOT 2019-12-17 01:57:27 +00:00
Yuming Wang c216ef1d03 Preparing Spark release v3.0.0-preview2-rc2 2019-12-17 01:57:21 +00:00
Kent Yao d3ec8b1735 [SPARK-30066][SQL] Support columnar execution on interval types
### What changes were proposed in this pull request?

Columnar execution support for interval types

### Why are the changes needed?

support cache tables with interval columns
improve performance too

### Does this PR introduce any user-facing change?

Yes cache table with accept interval columns

### How was this patch tested?

add ut

Closes #26699 from yaooqinn/SPARK-30066.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-12-14 13:10:46 -08:00
Liu,Linhong f22177c957 [SPARK-29486][SQL][FOLLOWUP] Document the reason to add days field
### What changes were proposed in this pull request?
Follow up of #26134 to document the reason to add days filed and explain how do we use it

### Why are the changes needed?
only comment

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
no need test

Closes #26701 from LinhongLiu/spark-29486-followup.

Authored-by: Liu,Linhong <liulinhong@baidu.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-11-30 08:43:34 -06:00
Kent Yao de21f28f8a [SPARK-29986][SQL] casting string to date/timestamp/interval should trim all whitespaces
### What changes were proposed in this pull request?

A java like string trim method trims all whitespaces that less or equal than 0x20. currently, our UTF8String handle the space =0x20 ONLY. This is not suitable for many cases in Spark, like trim for interval strings, date, timestamps, PostgreSQL like cast string to boolean.

### Why are the changes needed?

improve the white spaces handling in UTF8String, also with some bugs fixed

### Does this PR introduce any user-facing change?

yes,
string with `control character` at either end can be convert to date/timestamp and interval now

### How was this patch tested?

add ut

Closes #26626 from yaooqinn/SPARK-29986.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-25 14:37:04 +08:00
Kent Yao 2dd6807e42 [SPARK-28023][SQL] Add trim logic in UTF8String's toInt/toLong to make it consistent with other string-numeric casting
### What changes were proposed in this pull request?

Modify `UTF8String.toInt/toLong` to support trim spaces for both sides before converting it to byte/short/int/long.

With this kind of "cheap" trim can help improve performance for casting string to integrals. The idea is from https://github.com/apache/spark/pull/24872#issuecomment-556917834

### Why are the changes needed?

make the behavior consistent.

### Does this PR introduce any user-facing change?
yes, cast string to an integral type, and binary comparison between string and integrals will trim spaces first. their behavior will be consistent with float and double.
### How was this patch tested?
1. add ut.
2. benchmark tests
 the benchmark is modified based on https://github.com/apache/spark/pull/24872#issuecomment-503827016

```scala
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.spark.sql.execution.benchmark

import org.apache.spark.benchmark.Benchmark

/**
 * Benchmark trim the string when casting string type to Boolean/Numeric types.
 * To run this benchmark:
 * {{{
 *   1. without sbt:
 *      bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
 *   2. build/sbt "sql/test:runMain <this class>"
 *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
 *      Results will be written to "benchmarks/CastBenchmark-results.txt".
 * }}}
 */
object CastBenchmark extends SqlBasedBenchmark {
This conversation was marked as resolved by yaooqinn

  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    val title = "Cast String to Integral"
    runBenchmark(title) {
      withTempPath { dir =>
        val N = 500L << 14
        val df = spark.range(N)
        val types = Seq("int", "long")
        (1 to 5).by(2).foreach { i =>
          df.selectExpr(s"concat(id, '${" " * i}') as str")
            .write.mode("overwrite").parquet(dir + i.toString)
        }

        val benchmark = new Benchmark(title, N, minNumIters = 5, output = output)
        Seq(true, false).foreach { trim =>
          types.foreach { t =>
            val str = if (trim) "trim(str)" else "str"
            val expr = s"cast($str as $t) as c_$t"
            (1 to 5).by(2).foreach { i =>
              benchmark.addCase(expr + s" - with $i spaces") { _ =>
                spark.read.parquet(dir + i.toString).selectExpr(expr).collect()
              }
            }
          }
        }
        benchmark.run()
      }
    }
  }
}
```
#### benchmark result.
normal trim v.s. trim in toInt/toLong
```java
================================================================================================
Cast String to Integral
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.1
Intel(R) Core(TM) i5-5287U CPU  2.90GHz
Cast String to Integral:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
cast(trim(str) as int) as c_int - with 1 spaces          10220          12994        1337          0.8        1247.5       1.0X
cast(trim(str) as int) as c_int - with 3 spaces           4763           8356         357          1.7         581.4       2.1X
cast(trim(str) as int) as c_int - with 5 spaces           4791           8042         NaN          1.7         584.9       2.1X
cast(trim(str) as long) as c_long - with 1 spaces           4014           6755         NaN          2.0         490.0       2.5X
cast(trim(str) as long) as c_long - with 3 spaces           4737           6938         NaN          1.7         578.2       2.2X
cast(trim(str) as long) as c_long - with 5 spaces           4478           6919        1404          1.8         546.6       2.3X
cast(str as int) as c_int - with 1 spaces           4443           6222         NaN          1.8         542.3       2.3X
cast(str as int) as c_int - with 3 spaces           3659           3842         170          2.2         446.7       2.8X
cast(str as int) as c_int - with 5 spaces           4372           7996         NaN          1.9         533.7       2.3X
cast(str as long) as c_long - with 1 spaces           3866           5838         NaN          2.1         471.9       2.6X
cast(str as long) as c_long - with 3 spaces           3793           5449         NaN          2.2         463.0       2.7X
cast(str as long) as c_long - with 5 spaces           4947           5961        1198          1.7         603.9       2.1X
```

Closes #26622 from yaooqinn/cheapstringtrim.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-22 19:32:27 +08:00
HyukjinKwon 882f54b0a3 [SPARK-29870][SQL][FOLLOW-UP] Keep CalendarInterval's toString
### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/26418. This PR removed `CalendarInterval`'s `toString` with an unfinished changes.

### Why are the changes needed?

1. Ideally we should make each PR isolated and separate targeting one issue without touching unrelated codes.

2. There are some other places where the string formats were exposed to users. For example:

    ```scala
    scala> sql("select interval 1 days as a").selectExpr("to_csv(struct(a))").show()
    ```
    ```
    +--------------------------+
    |to_csv(named_struct(a, a))|
    +--------------------------+
    |      "CalendarInterval...|
    +--------------------------+
    ```

3.  Such fixes:

    ```diff
     private def writeMapData(
        map: MapData, mapType: MapType, fieldWriter: ValueWriter): Unit = {
      val keyArray = map.keyArray()
    + val keyString = mapType.keyType match {
    +   case CalendarIntervalType =>
    +    (i: Int) => IntervalUtils.toMultiUnitsString(keyArray.getInterval(i))
    +   case _ => (i: Int) => keyArray.get(i, mapType.keyType).toString
    + }
    ```

    can cause performance regression due to type dispatch for each map.

### Does this PR introduce any user-facing change?

Yes, see 2. case above.

### How was this patch tested?

Manually tested.

Closes #26572 from HyukjinKwon/SPARK-29783.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-11-19 09:11:41 +09:00
Kent Yao 5cebe587c7 [SPARK-29783][SQL] Support SQL Standard/ISO_8601 output style for interval type
### What changes were proposed in this pull request?

Add 3 interval output types which are named as `SQL_STANDARD`, `ISO_8601`, `MULTI_UNITS`. And we add a new conf `spark.sql.dialect.intervalOutputStyle` for this. The `MULTI_UNITS` style displays the interval values in the former behavior and it is the default. The newly added `SQL_STANDARD`, `ISO_8601` styles can be found in the following table.

Style | conf | Year-Month Interval | Day-Time Interval | Mixed Interval
-- | -- | -- | -- | --
Format With Time Unit Designators | MULTI_UNITS | 1 year 2 mons | 1 days 2 hours 3 minutes 4.123456 seconds | interval 1 days 2 hours 3 minutes 4.123456 seconds
SQL STANDARD  | SQL_STANDARD | 1-2 | 3 4:05:06 | -1-2 3 -4:05:06
ISO8601 Basic Format| ISO_8601| P1Y2M| P3DT4H5M6S|P-1Y-2M3D-4H-5M-6S

### Why are the changes needed?

for ANSI SQL support
### Does this PR introduce any user-facing change?

yes,interval out now has 3 output styles
### How was this patch tested?

add new unit tests

cc cloud-fan maropu MaxGekk HyukjinKwon thanks.

Closes #26418 from yaooqinn/SPARK-29783.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-18 15:42:22 +08:00
Kent Yao e026412d9c [SPARK-29679][SQL] Make interval type comparable and orderable
### What changes were proposed in this pull request?

interval type support >, >=, <, <=, =, <=>, order by, min,max..

### Why are the changes needed?

Part of SPARK-27764 Feature Parity between PostgreSQL and Spark
### Does this PR introduce any user-facing change?

yes, we now support compare intervals

### How was this patch tested?

add ut

Closes #26337 from yaooqinn/SPARK-29679.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-08 22:45:11 +08:00
Kent Yao 0a03839366 [SPARK-29787][SQL] Move methods add/subtract/negate from CalendarInterval to IntervalUtils
### What changes were proposed in this pull request?

Move method add/subtract/negate from CalendarInterval to IntervalUtils

### Why are the changes needed?

https://github.com/apache/spark/pull/26410#discussion_r343125468 suggested here
### Does this PR introduce any user-facing change?

no
### How was this patch tested?

add uts and move some

Closes #26423 from yaooqinn/SPARK-29787.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-08 10:28:58 +08:00
Kent Yao 9562b26914 [SPARK-29757][SQL] Move calendar interval constants together
### What changes were proposed in this pull request?
```java
  public static final int YEARS_PER_DECADE = 10;
  public static final int YEARS_PER_CENTURY = 100;
  public static final int YEARS_PER_MILLENNIUM = 1000;

  public static final byte MONTHS_PER_QUARTER = 3;
  public static final int MONTHS_PER_YEAR = 12;

  public static final byte DAYS_PER_WEEK = 7;
  public static final long DAYS_PER_MONTH = 30L;

  public static final long HOURS_PER_DAY = 24L;

  public static final long MINUTES_PER_HOUR = 60L;

  public static final long SECONDS_PER_MINUTE = 60L;
  public static final long SECONDS_PER_HOUR = MINUTES_PER_HOUR * SECONDS_PER_MINUTE;
  public static final long SECONDS_PER_DAY = HOURS_PER_DAY * SECONDS_PER_HOUR;

  public static final long MILLIS_PER_SECOND = 1000L;
  public static final long MILLIS_PER_MINUTE = SECONDS_PER_MINUTE * MILLIS_PER_SECOND;
  public static final long MILLIS_PER_HOUR = MINUTES_PER_HOUR * MILLIS_PER_MINUTE;
  public static final long MILLIS_PER_DAY = HOURS_PER_DAY * MILLIS_PER_HOUR;

  public static final long MICROS_PER_MILLIS = 1000L;
  public static final long MICROS_PER_SECOND = MILLIS_PER_SECOND * MICROS_PER_MILLIS;
  public static final long MICROS_PER_MINUTE = SECONDS_PER_MINUTE * MICROS_PER_SECOND;
  public static final long MICROS_PER_HOUR = MINUTES_PER_HOUR * MICROS_PER_MINUTE;
  public static final long MICROS_PER_DAY = HOURS_PER_DAY * MICROS_PER_HOUR;
  public static final long MICROS_PER_MONTH = DAYS_PER_MONTH * MICROS_PER_DAY;
  /* 365.25 days per year assumes leap year every four years */
  public static final long MICROS_PER_YEAR = (36525L * MICROS_PER_DAY) / 100;

  public static final long NANOS_PER_MICROS = 1000L;
  public static final long NANOS_PER_MILLIS = MICROS_PER_MILLIS * NANOS_PER_MICROS;
  public static final long NANOS_PER_SECOND = MILLIS_PER_SECOND * NANOS_PER_MILLIS;
```
The above parameters are defined in IntervalUtils, DateTimeUtils, and CalendarInterval, some of them are redundant, some of them are cross-referenced.

### Why are the changes needed?
To simplify code, enhance consistency and reduce risks

### Does this PR introduce any user-facing change?

no
### How was this patch tested?

modified uts

Closes #26399 from yaooqinn/SPARK-29757.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-07 19:48:19 +08:00
Wenchen Fan 9b61f90987 [SPARK-29761][SQL] do not output leading 'interval' in CalendarInterval.toString
### What changes were proposed in this pull request?

remove the leading "interval" in `CalendarInterval.toString`.

### Why are the changes needed?

Although it's allowed to have "interval" prefix when casting string to int, it's not recommended.

This is also consistent with pgsql:
```
cloud0fan=# select interval '1' day;
 interval
----------
 1 day
(1 row)
```

### Does this PR introduce any user-facing change?

yes, when display a dataframe with interval type column, the result is different.

### How was this patch tested?

updated tests.

Closes #26401 from cloud-fan/interval.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-07 15:44:50 +08:00
Maxim Gekk 29dc59ac29 [SPARK-29605][SQL] Optimize string to interval casting
### What changes were proposed in this pull request?
In the PR, I propose new function `stringToInterval()` in `IntervalUtils` for converting `UTF8String` to `CalendarInterval`. The function is used in casting a `STRING` column to an `INTERVAL` column.

### Why are the changes needed?
The proposed implementation is ~10 times faster. For example, parsing 9 interval units on JDK 8:
Before:
```
9 units w/ interval                               14004          14125         116          0.1       14003.6       0.0X
9 units w/o interval                              13785          14056         290          0.1       13784.9       0.0X
```
After:
```
9 units w/ interval                                1343           1344           1          0.7        1343.0       0.3X
9 units w/o interval                               1345           1349           8          0.7        1344.6       0.3X
```

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
- By new tests for `stringToInterval` in `IntervalUtilsSuite`
- By existing tests

Closes #26256 from MaxGekk/string-to-interval.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-07 12:39:52 +08:00
Maxim Gekk 441d4c953e [SPARK-29723][SQL] Get date and time parts of an interval as java classes
### What changes were proposed in this pull request?
I propose 2 new methods for `CalendarInterval`:
- `extractAsPeriod()` returns the date part of an interval as an instance of `java.time.Period`
- `extractAsDuration()` returns the time part of an interval as an instance of `java.time.Duration`

For example:
```scala
scala> import org.apache.spark.unsafe.types.CalendarInterval
scala> import java.time._
scala> val i = spark.sql("select interval 1 year 3 months 4 days 10 hours 30 seconds").collect()(0).getAs[CalendarInterval](0)
scala> LocalDate.of(2019, 11, 1).plus(i.period())
res8: java.time.LocalDate = 2021-02-05
scala> ZonedDateTime.parse("2019-11-01T12:13:14Z").plus(i.extractAsPeriod()).plus(i.extractAsDuration())
res9: java.time.ZonedDateTime = 2021-02-05T22:13:44Z
```

### Why are the changes needed?
Taking into account that `CalendarInterval` has been already partially exposed to users via the collect operation, and probably it will be fully exposed in the future, it could be convenient for users to get the date and time parts of intervals as java classes:
- to avoid unnecessary dependency from Spark's classes in user code
- to easily use external libraries that accept standard Java classes.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By new test in `CalendarIntervalSuite`.

Closes #26368 from MaxGekk/interval-java-period-duration.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-04 11:07:54 -08:00
Maxim Gekk fb60c2a170 [SPARK-29671][SQL] Simplify string representation of intervals
### What changes were proposed in this pull request?
In the PR, I propose to changed `CalendarInterval.toString`:
- to skip the `week` unit
- to convert `milliseconds` and `microseconds` as the fractional part of the `seconds` unit.

### Why are the changes needed?
To improve readability.

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
- By `CalendarIntervalSuite` and `IntervalUtilsSuite`
- `literals.sql`, `datetime.sql` and `interval.sql`

Closes #26367 from MaxGekk/interval-to-string-format.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 22:56:59 -08:00
Maxim Gekk 80a89873b2 [SPARK-29733][TESTS] Fix wrong order of parameters passed to assertEquals
### What changes were proposed in this pull request?
The `assertEquals` method of JUnit Assert requires the first parameter to be the expected value. In this PR, I propose to change the order of parameters when the expected value is passed as the second parameter.

### Why are the changes needed?
Wrong order of assert parameters confuses when the assert fails and the parameters have special string representation. For example:
```java
assertEquals(input1.add(input2), new CalendarInterval(5, 5, 367200000000L));
```
```
java.lang.AssertionError:
Expected :interval 5 months 5 days 101 hours
Actual   :interval 5 months 5 days 102 hours
```

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By existing tests.

Closes #26377 from MaxGekk/fix-order-in-assert-equals.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-11-03 11:21:28 -08:00
Liu,Linhong a4382f7fe1 [SPARK-29486][SQL] CalendarInterval should have 3 fields: months, days and microseconds
### What changes were proposed in this pull request?
Current CalendarInterval has 2 fields: months and microseconds. This PR try to change it
to 3 fields: months, days and microseconds. This is because one logical day interval may
have different number of microseconds (daylight saving).

### Why are the changes needed?
One logical day interval may have different number of microseconds (daylight saving).
For example, in PST timezone, there will be 25 hours from 2019-11-2 12:00:00 to
2019-11-3 12:00:00

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
unit test and new added test cases

Closes #26134 from LinhongLiu/calendarinterval.

Authored-by: Liu,Linhong <liulinhong@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-11-01 18:12:33 +08:00
Xingbo Jiang 8207c835b4 Revert "Prepare Spark release v3.0.0-preview-rc2"
This reverts commit 007c873ae3.
2019-10-30 17:45:44 -07:00