### What changes were proposed in this pull request?
Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid, when ANSI mode is enable. This patch should update GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast.
### Why are the changes needed?
For ANSI mode.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Added UT and Existing UT.
Closes#30442 from leanken/leanken-SPARK-33498.
Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
With `ParseUrl`, instead of return null we throw exception if input string is not a vaild url.
### Why are the changes needed?
For ANSI mode.
### Does this PR introduce _any_ user-facing change?
Yes, user will get exception if `set spark.sql.ansi.enabled=true`.
### How was this patch tested?
Add test.
Closes#30399 from ulysses-you/SPARK-33468.
Lead-authored-by: ulysses <youxiduo@weidian.com>
Co-authored-by: ulysses-you <youxiduo@weidian.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
In section 6.13 of the ANSI SQL standard, there are syntax rules for valid combinations of the source and target data types.
![image](https://user-images.githubusercontent.com/1097932/98212874-17356f80-1ef9-11eb-8f2b-385f32db404a.png)
Comparing the ANSI CAST syntax rules with the current default behavior of Spark:
![image](https://user-images.githubusercontent.com/1097932/98789831-b7870a80-23b7-11eb-9b5f-469a42e0ee4a.png)
To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the following casting in ANSI mode:
```
TimeStamp <=> Boolean
Date <=> Boolean
Numeric <=> Timestamp
Numeric <=> Date
Numeric <=> Binary
String <=> Array
String <=> Map
String <=> Struct
```
The following castings are considered invalid in ANSI SQL standard, but they are quite straight forward. Let's Allow them for now
```
Numeric <=> Boolean
String <=> Binary
```
### Why are the changes needed?
Better ANSI SQL compliance
### Does this PR introduce _any_ user-facing change?
Yes, the following castings will not be allowed in ANSI mode:
```
TimeStamp <=> Boolean
Date <=> Boolean
Numeric <=> Timestamp
Numeric <=> Date
Numeric <=> Binary
String <=> Array
String <=> Map
String <=> Struct
```
### How was this patch tested?
Unit test
The ANSI Compliance doc preview:
![image](https://user-images.githubusercontent.com/1097932/98946017-2cd20880-24a8-11eb-8161-65749bfdd03a.png)
Closes#30260 from gengliangwang/ansiCanCast.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
Instead of returning NULL, throws runtime NoSuchElementException towards invalid key accessing in map-like functions, such as element_at, GetMapValue, when ANSI mode is on.
### Why are the changes needed?
For ANSI mode.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Added UT and Existing UT.
Closes#30386 from leanken/leanken-SPARK-33460.
Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Instead of returning NULL, throws runtime ArrayIndexOutOfBoundsException when ansiMode is enable for `element_at`,`elt`, `GetArrayItem` functions.
### Why are the changes needed?
For ansiMode.
### Does this PR introduce any user-facing change?
When `spark.sql.ansi.enabled` = true, Spark will throw `ArrayIndexOutOfBoundsException` if out-of-range index when accessing array elements
### How was this patch tested?
Added UT and existing UT.
Closes#30297 from leanken/leanken-SPARK-33386.
Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
Remove the YEAR, MONTH, DAY, HOUR, MINUTE, SECOND keywords. They are not useful in the parser, as we need to support plural like YEARS, so the parser has to accept the general identifier as interval unit anyway.
### Why are the changes needed?
These keywords are reserved in ANSI. If Spark has these keywords, then they become reserved under ANSI mode. This makes Spark not able to run TPCDS queries as they use YEAR as alias name.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added `TPCDSQueryANSISuite`, to make sure Spark with ANSI mode can run TPCDS queries.
Closes#29560 from cloud-fan/keyword.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
Improve the `SQLKeywordSuite` so that:
1. it checks keywords under default mode as well
2. it checks if there are typos in the doc (found one and fixed in this PR)
### Why are the changes needed?
better test coverage
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
N/A
Closes#29200 from cloud-fan/test.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This PR adds the SQL standard command - `SET TIME ZONE` to the current default time zone displacement for the current SQL-session, which is the same as the existing `set spark.sql.session.timeZone=xxx'.
All in all, this PR adds syntax as following,
```
SET TIME ZONE LOCAL;
SET TIME ZONE 'valid time zone'; -- zone offset or region
SET TIME ZONE INTERVAL XXXX; -- xxx must in [-18, + 18] hours, * this range is bigger than ansi [-14, + 14]
```
### Why are the changes needed?
ANSI compliance and supply pure SQL users a way to retrieve all supported TimeZones
### Does this PR introduce _any_ user-facing change?
yes, add new syntax.
### How was this patch tested?
add unit tests.
and locally verified reference doc
![image](https://user-images.githubusercontent.com/8326978/87510244-c8dc3680-c6a5-11ea-954c-b098be84afee.png)
Closes#29064 from yaooqinn/SPARK-32272.
Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?
This PR improves the test to make sure all the SQL keywords are documented correctly. It fixes several issues:
1. some keywords are not documented
2. some keywords are not ANSI SQL keywords but documented as reserved/non-reserved.
### Why are the changes needed?
To make sure the implementation matches the doc.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
new test
Closes#29055 from cloud-fan/keyword.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved to non-reserved.
### Why are the changes needed?
To comply with the ANSI/SQL standard.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Added tests.
Closes#28807 from maropu/SPARK-26905-2.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
We should use dataType.catalogString to unified the data type mismatch message.
Before:
```sql
spark-sql> create table SPARK_31834(a int) using parquet;
spark-sql> insert into SPARK_31834 select '1';
Error in query: Cannot write incompatible data to table '`default`.`spark_31834`':
- Cannot safely cast 'a': StringType to IntegerType;
```
After:
```sql
spark-sql> create table SPARK_31834(a int) using parquet;
spark-sql> insert into SPARK_31834 select '1';
Error in query: Cannot write incompatible data to table '`default`.`spark_31834`':
- Cannot safely cast 'a': string to int;
```
### How was this patch tested?
UT.
Closes#28654 from lipzhu/SPARK-31834.
Authored-by: lipzhu <lipzhu@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request?
Remove the unneeded embedded inline HTML markup by using the basic markdown syntax.
Please see #28414
### Why are the changes needed?
Make the doc cleaner and easily editable by MD editors.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manually build and check
Closes#28451 from huaxingao/html_cleanup.
Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
This PR intends to clean up the SQL documents in `doc/sql-ref*`.
Main changes are as follows;
- Fixes wrong syntaxes and capitalize sub-titles
- Adds some DDL queries in `Examples` so that users can run examples there
- Makes query output in `Examples` follows the `Dataset.showString` (right-aligned) format
- Adds/Removes spaces, Indents, or blank lines to follow the format below;
```
---
license...
---
### Description
Writes what's the syntax is.
### Syntax
{% highlight sql %}
SELECT...
WHERE... // 4 indents after the second line
...
{% endhighlight %}
### Parameters
<dl>
<dt><code><em>Param Name</em></code></dt>
<dd>
Param Description
</dd>
...
</dl>
### Examples
{% highlight sql %}
-- It is better that users are able to execute example queries here.
-- So, we prepare test data in the first section if possible.
CREATE TABLE t (key STRING, value DOUBLE);
INSERT INTO t VALUES
('a', 1.0), ('a', 2.0), ('b', 3.0), ('c', 4.0);
-- query output has 2 indents and it follows the `Dataset.showString`
-- format (right-aligned).
SELECT * FROM t;
+---+-----+
|key|value|
+---+-----+
| a| 1.0|
| a| 2.0|
| b| 3.0|
| c| 4.0|
+---+-----+
-- Query statements after the second line have 4 indents.
SELECT key, SUM(value)
FROM t
GROUP BY key;
+---+----------+
|key|sum(value)|
+---+----------+
| c| 4.0|
| b| 3.0|
| a| 3.0|
+---+----------+
...
{% endhighlight %}
### Related Statements
* [XXX](xxx.html)
* ...
```
### Why are the changes needed?
The most changes of this PR are pretty minor, but I think the consistent formats/rules to write documents are important for long-term maintenance in our community
### Does this PR introduce any user-facing change?
Yes.
### How was this patch tested?
Manually checked.
Closes#28151 from maropu/MakeRightAligned.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request?
Previously, user can issue `SHOW TABLES` to get info of both tables and views.
This PR (SPARK-31113) implements `SHOW VIEWS` SQL command similar to HIVE to get views only.(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowViews)
**Hive** -- Only show view names
```
hive> SHOW VIEWS;
OK
view_1
view_2
...
```
**Spark(Hive-Compatible)** -- Only show view names, used in tests and `SparkSQLDriver` for CLI applications
```
SHOW VIEWS IN showdb;
view_1
view_2
...
```
**Spark** -- Show more information database/viewName/isTemporary
```
spark-sql> SHOW VIEWS;
userdb view_1 false
userdb view_2 false
...
```
### Why are the changes needed?
`SHOW VIEWS` command provides better granularity to only get information of views.
### Does this PR introduce any user-facing change?
Add new `SHOW VIEWS` SQL command
### How was this patch tested?
Add new test `show-views.sql` and pass existing tests
Closes#27897 from Eric5553/ShowViews.
Authored-by: Eric Wu <492960551@qq.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?
A followup of https://github.com/apache/spark/pull/27936 to update document.
### Why are the changes needed?
correct document
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
N/A
Closes#27950 from cloud-fan/null.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
### What changes were proposed in this pull request?
This PR intends to fix typos and phrases in the `/docs` directory. To find them, I run the Intellij typo checker.
### Why are the changes needed?
For better documents.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A
Closes#27819 from maropu/TypoFix-20200306.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
### What changes were proposed in this pull request?
This is a follow-up of https://github.com/apache/spark/pull/27489.
It declares the ANSI SQL compliance options as experimental in the documentation.
### Why are the changes needed?
The options are experimental. There can be new features/behaviors in future releases.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Generating doc
Closes#27590 from gengliangwang/ExperimentalAnsi.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>