spark-instrumented-optimizer/docs/sql-ref-ansi-compliance.md

464 lines
24 KiB
Markdown
Raw Normal View History

[SPARK-26215][SQL] Define reserved/non-reserved keywords based on the ANSI SQL standard ## What changes were proposed in this pull request? This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in `docs/sql-reserved-and-non-reserved-key-words.md`. NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities. References: - The reserved/non-reserved SQL keywords in the ANSI SQL standards: https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf - SQL Key Words in PostgreSQL: https://www.postgresql.org/docs/current/sql-keywords-appendix.html ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. Closes #23259 from maropu/SPARK-26215-WIP. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2019-02-22 18:38:47 -05:00
---
layout: global
title: ANSI Compliance
displayTitle: ANSI Compliance
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
[SPARK-26215][SQL] Define reserved/non-reserved keywords based on the ANSI SQL standard ## What changes were proposed in this pull request? This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in `docs/sql-reserved-and-non-reserved-key-words.md`. NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities. References: - The reserved/non-reserved SQL keywords in the ANSI SQL standards: https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf - SQL Key Words in PostgreSQL: https://www.postgresql.org/docs/current/sql-keywords-appendix.html ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. Closes #23259 from maropu/SPARK-26215-WIP. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2019-02-22 18:38:47 -05:00
---
Since Spark 3.0, Spark SQL introduces two experimental options to comply with the SQL standard: `spark.sql.ansi.enabled` and `spark.sql.storeAssignmentPolicy` (See a table below for details).
When `spark.sql.ansi.enabled` is set to `true`, Spark SQL uses an ANSI compliant dialect instead of being Hive compliant. For example, Spark will throw an exception at runtime instead of returning null results if the inputs to a SQL operator/function are invalid. Some ANSI dialect features may be not from the ANSI SQL standard directly, but their behaviors align with ANSI SQL's style.
Moreover, Spark SQL has an independent option to control implicit casting behaviours when inserting rows in a table.
The casting behaviours are defined as store assignment rules in the standard.
When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies with the ANSI store assignment rules. This is a separate configuration because its default value is `ANSI`, while the configuration `spark.sql.ansi.enabled` is disabled by default.
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|Property Name|Default|Meaning|Since Version|
|-------------|-------|-------|-------------|
|`spark.sql.ansi.enabled`|false|(Experimental) When true, Spark tries to conform to the ANSI SQL specification: <br/> 1. Spark will throw a runtime exception if an overflow occurs in any operation on integral/decimal field. <br/> 2. Spark will forbid using the reserved keywords of ANSI SQL as identifiers in the SQL parser.|3.0.0|
|`spark.sql.storeAssignmentPolicy`|ANSI|(Experimental) When inserting a value into a column with different data type, Spark will perform type coercion. Currently, we support 3 policies for the type coercion rules: ANSI, legacy and strict. With ANSI policy, Spark performs the type coercion as per ANSI SQL. In practice, the behavior is mostly the same as PostgreSQL. It disallows certain unreasonable type conversions such as converting string to int or double to boolean. With legacy policy, Spark allows the type coercion as long as it is a valid Cast, which is very loose. e.g. converting string to int or double to boolean is allowed. It is also the only behavior in Spark 2.x and it is compatible with Hive. With strict policy, Spark doesn't allow any possible precision loss or data truncation in type coercion, e.g. converting double to int or decimal to double is not allowed.|3.0.0|
The following subsections present behaviour changes in arithmetic operations, type conversions, and SQL parsing when the ANSI mode enabled.
### Arithmetic Operations
In Spark SQL, arithmetic operations performed on numeric types (with the exception of decimal) are not checked for overflows by default.
This means that in case an operation causes overflows, the result is the same with the corresponding operation in a Java/Scala program (e.g., if the sum of 2 integers is higher than the maximum value representable, the result is a negative number).
On the other hand, Spark SQL returns null for decimal overflows.
When `spark.sql.ansi.enabled` is set to `true` and an overflow occurs in numeric and interval arithmetic operations, it throws an arithmetic exception at runtime.
```sql
-- `spark.sql.ansi.enabled=true`
SELECT 2147483647 + 1;
java.lang.ArithmeticException: integer overflow
SELECT abs(-2147483648);
java.lang.ArithmeticException: integer overflow
-- `spark.sql.ansi.enabled=false`
SELECT 2147483647 + 1;
+----------------+
|(2147483647 + 1)|
+----------------+
| -2147483648|
+----------------+
SELECT abs(-2147483648);
+----------------+
|abs(-2147483648)|
+----------------+
| -2147483648|
+----------------+
```
### Type Conversion
Spark SQL has three kinds of type conversions: explicit casting, type coercion, and store assignment casting.
When `spark.sql.ansi.enabled` is set to `true`, explicit casting by `CAST` syntax throws a runtime exception for illegal cast patterns defined in the standard, e.g. casts from a string to an integer.
On the other hand, `INSERT INTO` syntax throws an analysis exception when the ANSI mode enabled via `spark.sql.storeAssignmentPolicy=ANSI`.
The type conversion of Spark ANSI mode follows the syntax rules of section 6.13 "cast specification" in [ISO/IEC 9075-2:2011 Information technology — Database languages - SQL — Part 2: Foundation (SQL/Foundation)](https://www.iso.org/standard/53682.html), except it specially allows the following
[SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode ### What changes were proposed in this pull request? In section 6.13 of the ANSI SQL standard, there are syntax rules for valid combinations of the source and target data types. ![image](https://user-images.githubusercontent.com/1097932/98212874-17356f80-1ef9-11eb-8f2b-385f32db404a.png) Comparing the ANSI CAST syntax rules with the current default behavior of Spark: ![image](https://user-images.githubusercontent.com/1097932/98789831-b7870a80-23b7-11eb-9b5f-469a42e0ee4a.png) To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the following casting in ANSI mode: ``` TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct ``` The following castings are considered invalid in ANSI SQL standard, but they are quite straight forward. Let's Allow them for now ``` Numeric <=> Boolean String <=> Binary ``` ### Why are the changes needed? Better ANSI SQL compliance ### Does this PR introduce _any_ user-facing change? Yes, the following castings will not be allowed in ANSI mode: ``` TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct ``` ### How was this patch tested? Unit test The ANSI Compliance doc preview: ![image](https://user-images.githubusercontent.com/1097932/98946017-2cd20880-24a8-11eb-8161-65749bfdd03a.png) Closes #30260 from gengliangwang/ansiCanCast. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-11-18 19:23:36 -05:00
straightforward type conversions which are disallowed as per the ANSI standard:
* NumericType <=> BooleanType
* StringType <=> BinaryType
* ArrayType => String
* MapType => String
* StructType => String
[SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode ### What changes were proposed in this pull request? In section 6.13 of the ANSI SQL standard, there are syntax rules for valid combinations of the source and target data types. ![image](https://user-images.githubusercontent.com/1097932/98212874-17356f80-1ef9-11eb-8f2b-385f32db404a.png) Comparing the ANSI CAST syntax rules with the current default behavior of Spark: ![image](https://user-images.githubusercontent.com/1097932/98789831-b7870a80-23b7-11eb-9b5f-469a42e0ee4a.png) To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the following casting in ANSI mode: ``` TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct ``` The following castings are considered invalid in ANSI SQL standard, but they are quite straight forward. Let's Allow them for now ``` Numeric <=> Boolean String <=> Binary ``` ### Why are the changes needed? Better ANSI SQL compliance ### Does this PR introduce _any_ user-facing change? Yes, the following castings will not be allowed in ANSI mode: ``` TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct ``` ### How was this patch tested? Unit test The ANSI Compliance doc preview: ![image](https://user-images.githubusercontent.com/1097932/98946017-2cd20880-24a8-11eb-8161-65749bfdd03a.png) Closes #30260 from gengliangwang/ansiCanCast. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-11-18 19:23:36 -05:00
The valid combinations of target data type and source data type in a `CAST` expression are given by the following table.
“Y” indicates that the combination is syntactically valid without restriction and “N” indicates that the combination is not valid.
| Source\Target | Numeric | String | Date | Timestamp | Interval | Boolean | Binary | Array | Map | Struct |
[SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode ### What changes were proposed in this pull request? In section 6.13 of the ANSI SQL standard, there are syntax rules for valid combinations of the source and target data types. ![image](https://user-images.githubusercontent.com/1097932/98212874-17356f80-1ef9-11eb-8f2b-385f32db404a.png) Comparing the ANSI CAST syntax rules with the current default behavior of Spark: ![image](https://user-images.githubusercontent.com/1097932/98789831-b7870a80-23b7-11eb-9b5f-469a42e0ee4a.png) To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the following casting in ANSI mode: ``` TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct ``` The following castings are considered invalid in ANSI SQL standard, but they are quite straight forward. Let's Allow them for now ``` Numeric <=> Boolean String <=> Binary ``` ### Why are the changes needed? Better ANSI SQL compliance ### Does this PR introduce _any_ user-facing change? Yes, the following castings will not be allowed in ANSI mode: ``` TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct ``` ### How was this patch tested? Unit test The ANSI Compliance doc preview: ![image](https://user-images.githubusercontent.com/1097932/98946017-2cd20880-24a8-11eb-8161-65749bfdd03a.png) Closes #30260 from gengliangwang/ansiCanCast. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-11-18 19:23:36 -05:00
|-----------|---------|--------|------|-----------|----------|---------|--------|-------|-----|--------|
| Numeric | <span style="color:red">**Y**</span> | Y | N | N | N | Y | N | N | N | N |
| String | <span style="color:red">**Y**</span> | Y | <span style="color:red">**Y**</span> | <span style="color:red">**Y**</span> | <span style="color:red">**Y**</span> | <span style="color:red">**Y**</span> | Y | N | N | N |
| Date | N | Y | Y | Y | N | N | N | N | N | N |
| Timestamp | N | Y | Y | Y | N | N | N | N | N | N |
| Interval | N | Y | N | N | Y | N | N | N | N | N |
| Boolean | Y | Y | N | N | N | Y | N | N | N | N |
| Binary | N | Y | N | N | N | N | Y | N | N | N |
| Array | N | Y | N | N | N | N | N | <span style="color:red">**Y**</span> | N | N |
| Map | N | Y | N | N | N | N | N | N | <span style="color:red">**Y**</span> | N |
| Struct | N | Y | N | N | N | N | N | N | N | <span style="color:red">**Y**</span> |
In the table above, all the `CAST`s that can cause runtime exceptions are marked as red <span style="color:red">**Y**</span>:
* CAST(Numeric AS Numeric): raise an overflow exception if the value is out of the target data type's range.
* CAST(String AS (Numeric/Date/Timestamp/Interval/Boolean)): raise a runtime exception if the value can't be parsed as the target data type.
* CAST(Array AS Array): raise an exception if there is any on the conversion of the elements.
* CAST(Map AS Map): raise an exception if there is any on the conversion of the keys and the values.
* CAST(Struct AS Struct): raise an exception if there is any on the conversion of the struct fields.
[SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode ### What changes were proposed in this pull request? In section 6.13 of the ANSI SQL standard, there are syntax rules for valid combinations of the source and target data types. ![image](https://user-images.githubusercontent.com/1097932/98212874-17356f80-1ef9-11eb-8f2b-385f32db404a.png) Comparing the ANSI CAST syntax rules with the current default behavior of Spark: ![image](https://user-images.githubusercontent.com/1097932/98789831-b7870a80-23b7-11eb-9b5f-469a42e0ee4a.png) To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the following casting in ANSI mode: ``` TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct ``` The following castings are considered invalid in ANSI SQL standard, but they are quite straight forward. Let's Allow them for now ``` Numeric <=> Boolean String <=> Binary ``` ### Why are the changes needed? Better ANSI SQL compliance ### Does this PR introduce _any_ user-facing change? Yes, the following castings will not be allowed in ANSI mode: ``` TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct ``` ### How was this patch tested? Unit test The ANSI Compliance doc preview: ![image](https://user-images.githubusercontent.com/1097932/98946017-2cd20880-24a8-11eb-8161-65749bfdd03a.png) Closes #30260 from gengliangwang/ansiCanCast. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-11-18 19:23:36 -05:00
Currently, the ANSI mode affects explicit casting and assignment casting only.
In future releases, the behaviour of type coercion might change along with the other two type conversion rules.
```sql
-- Examples of explicit casting
-- `spark.sql.ansi.enabled=true`
SELECT CAST('a' AS INT);
java.lang.NumberFormatException: invalid input syntax for type numeric: a
SELECT CAST(2147483648L AS INT);
java.lang.ArithmeticException: Casting 2147483648 to int causes overflow
SELECT CAST(DATE'2020-01-01' AS INT)
org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(DATE '2020-01-01' AS INT)' due to data type mismatch: cannot cast date to int.
To convert values from date to int, you can use function UNIX_DATE instead.
-- `spark.sql.ansi.enabled=false` (This is a default behaviour)
SELECT CAST('a' AS INT);
+--------------+
|CAST(a AS INT)|
+--------------+
| null|
+--------------+
SELECT CAST(2147483648L AS INT);
+-----------------------+
|CAST(2147483648 AS INT)|
+-----------------------+
| -2147483648|
+-----------------------+
SELECT CAST(DATE'2020-01-01' AS INT)
+------------------------------+
|CAST(DATE '2020-01-01' AS INT)|
+------------------------------+
| null|
+------------------------------+
-- Examples of store assignment rules
CREATE TABLE t (v INT);
-- `spark.sql.storeAssignmentPolicy=ANSI`
INSERT INTO t VALUES ('1');
org.apache.spark.sql.AnalysisException: Cannot write incompatible data to table '`default`.`t`':
- Cannot safely cast 'v': string to int;
-- `spark.sql.storeAssignmentPolicy=LEGACY` (This is a legacy behaviour until Spark 2.x)
INSERT INTO t VALUES ('1');
SELECT * FROM t;
+---+
| v|
+---+
| 1|
+---+
```
### SQL Functions
The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`).
- `size`: This function returns null for null input.
- `element_at`:
- This function throws `ArrayIndexOutOfBoundsException` if using invalid indices.
- This function throws `NoSuchElementException` if key does not exist in map.
- `elt`: This function throws `ArrayIndexOutOfBoundsException` if using invalid indices.
- `parse_url`: This function throws `IllegalArgumentException` if an input string is not a valid url.
- `to_date`: This function should fail with an exception if the input string can't be parsed, or the pattern string is invalid.
- `to_timestamp`: This function should fail with an exception if the input string can't be parsed, or the pattern string is invalid.
- `unix_timestamp`: This function should fail with an exception if the input string can't be parsed, or the pattern string is invalid.
- `to_unix_timestamp`: This function should fail with an exception if the input string can't be parsed, or the pattern string is invalid.
- `make_date`: This function should fail with an exception if the result date is invalid.
- `make_timestamp`: This function should fail with an exception if the result timestamp is invalid.
- `make_interval`: This function should fail with an exception if the result interval is invalid.
- `next_day`: This function throws `IllegalArgumentException` if input is not a valid day of week.
### SQL Operators
The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`).
- `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices.
- `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map.
- `GROUP BY`: aliases in a select list can not be used in GROUP BY clauses. Each column referenced in a GROUP BY clause shall unambiguously reference a column of the table resulting from the FROM clause.
### SQL Keywords
[SPARK-30125][SQL] Remove PostgreSQL dialect ### What changes were proposed in this pull request? Reprocess all PostgreSQL dialect related PRs, listing in order: - #25158: PostgreSQL integral division support [revert] - #25170: UT changes for the integral division support [revert] - #25458: Accept "true", "yes", "1", "false", "no", "0", and unique prefixes as input and trim input for the boolean data type. [revert] - #25697: Combine below 2 feature tags into "spark.sql.dialect" [revert] - #26112: Date substraction support [keep the ANSI-compliant part] - #26444: Rename config "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" [revert] - #26463: Cast to boolean support for PostgreSQL dialect [revert] - #26584: Make the behavior of Postgre dialect independent of ansi mode config [keep the ANSI-compliant part] ### Why are the changes needed? As the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html, we need to remove PostgreSQL dialect form code base for several reasons: 1. The current approach makes the codebase complicated and hard to maintain. 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. ### Does this PR introduce any user-facing change? Yes, the config `spark.sql.dialect` will be removed. ### How was this patch tested? Existing UT. Closes #26763 from xuanyuanking/SPARK-30125. Lead-authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-10 12:22:34 -05:00
When `spark.sql.ansi.enabled` is true, Spark SQL will use the ANSI mode parser.
In this mode, Spark SQL has two kinds of keywords:
* Reserved keywords: Keywords that are reserved and can't be used as identifiers for table, view, column, function, alias, etc.
* Non-reserved keywords: Keywords that have a special meaning only in particular contexts and can be used as identifiers in other contexts. For example, `EXPLAIN SELECT ...` is a command, but EXPLAIN can be used as identifiers in other places.
[SPARK-26215][SQL] Define reserved/non-reserved keywords based on the ANSI SQL standard ## What changes were proposed in this pull request? This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in `docs/sql-reserved-and-non-reserved-key-words.md`. NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities. References: - The reserved/non-reserved SQL keywords in the ANSI SQL standards: https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf - SQL Key Words in PostgreSQL: https://www.postgresql.org/docs/current/sql-keywords-appendix.html ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. Closes #23259 from maropu/SPARK-26215-WIP. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2019-02-22 18:38:47 -05:00
When the ANSI mode is disabled, Spark SQL has two kinds of keywords:
* Non-reserved keywords: Same definition as the one when the ANSI mode enabled.
* Strict-non-reserved keywords: A strict version of non-reserved keywords, which can not be used as table alias.
[SPARK-30125][SQL] Remove PostgreSQL dialect ### What changes were proposed in this pull request? Reprocess all PostgreSQL dialect related PRs, listing in order: - #25158: PostgreSQL integral division support [revert] - #25170: UT changes for the integral division support [revert] - #25458: Accept "true", "yes", "1", "false", "no", "0", and unique prefixes as input and trim input for the boolean data type. [revert] - #25697: Combine below 2 feature tags into "spark.sql.dialect" [revert] - #26112: Date substraction support [keep the ANSI-compliant part] - #26444: Rename config "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" [revert] - #26463: Cast to boolean support for PostgreSQL dialect [revert] - #26584: Make the behavior of Postgre dialect independent of ansi mode config [keep the ANSI-compliant part] ### Why are the changes needed? As the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html, we need to remove PostgreSQL dialect form code base for several reasons: 1. The current approach makes the codebase complicated and hard to maintain. 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. ### Does this PR introduce any user-facing change? Yes, the config `spark.sql.dialect` will be removed. ### How was this patch tested? Existing UT. Closes #26763 from xuanyuanking/SPARK-30125. Lead-authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-12-10 12:22:34 -05:00
By default `spark.sql.ansi.enabled` is false.
Below is a list of all the keywords in Spark SQL.
[SPARK-26215][SQL] Define reserved/non-reserved keywords based on the ANSI SQL standard ## What changes were proposed in this pull request? This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in `docs/sql-reserved-and-non-reserved-key-words.md`. NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities. References: - The reserved/non-reserved SQL keywords in the ANSI SQL standards: https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf - SQL Key Words in PostgreSQL: https://www.postgresql.org/docs/current/sql-keywords-appendix.html ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. Closes #23259 from maropu/SPARK-26215-WIP. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2019-02-22 18:38:47 -05:00
|Keyword|Spark SQL<br/>ANSI Mode|Spark SQL<br/>Default Mode|SQL-2016|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|-------|----------------------|-------------------------|--------|
|ADD|non-reserved|non-reserved|non-reserved|
|AFTER|non-reserved|non-reserved|non-reserved|
|ALL|reserved|non-reserved|reserved|
|ALTER|non-reserved|non-reserved|reserved|
|ANALYZE|non-reserved|non-reserved|non-reserved|
|AND|reserved|non-reserved|reserved|
|ANTI|non-reserved|strict-non-reserved|non-reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|ANY|reserved|non-reserved|reserved|
|ARCHIVE|non-reserved|non-reserved|non-reserved|
|ARRAY|non-reserved|non-reserved|reserved|
|AS|reserved|non-reserved|reserved|
|ASC|non-reserved|non-reserved|non-reserved|
|AT|non-reserved|non-reserved|reserved|
|AUTHORIZATION|reserved|non-reserved|reserved|
|BETWEEN|non-reserved|non-reserved|reserved|
|BOTH|reserved|non-reserved|reserved|
|BUCKET|non-reserved|non-reserved|non-reserved|
|BUCKETS|non-reserved|non-reserved|non-reserved|
|BY|non-reserved|non-reserved|reserved|
|CACHE|non-reserved|non-reserved|non-reserved|
|CASCADE|non-reserved|non-reserved|non-reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|CASE|reserved|non-reserved|reserved|
|CAST|reserved|non-reserved|reserved|
|CHANGE|non-reserved|non-reserved|non-reserved|
|CHECK|reserved|non-reserved|reserved|
|CLEAR|non-reserved|non-reserved|non-reserved|
|CLUSTER|non-reserved|non-reserved|non-reserved|
|CLUSTERED|non-reserved|non-reserved|non-reserved|
|CODEGEN|non-reserved|non-reserved|non-reserved|
|COLLATE|reserved|non-reserved|reserved|
|COLLECTION|non-reserved|non-reserved|non-reserved|
|COLUMN|reserved|non-reserved|reserved|
|COLUMNS|non-reserved|non-reserved|non-reserved|
|COMMENT|non-reserved|non-reserved|non-reserved|
|COMMIT|non-reserved|non-reserved|reserved|
|COMPACT|non-reserved|non-reserved|non-reserved|
|COMPACTIONS|non-reserved|non-reserved|non-reserved|
|COMPUTE|non-reserved|non-reserved|non-reserved|
|CONCATENATE|non-reserved|non-reserved|non-reserved|
|CONSTRAINT|reserved|non-reserved|reserved|
|COST|non-reserved|non-reserved|non-reserved|
|CREATE|reserved|non-reserved|reserved|
|CROSS|reserved|strict-non-reserved|reserved|
|CUBE|non-reserved|non-reserved|reserved|
|CURRENT|non-reserved|non-reserved|reserved|
|CURRENT_DATE|reserved|non-reserved|reserved|
|CURRENT_TIME|reserved|non-reserved|reserved|
|CURRENT_TIMESTAMP|reserved|non-reserved|reserved|
|CURRENT_USER|reserved|non-reserved|reserved|
|DATA|non-reserved|non-reserved|non-reserved|
|DATABASE|non-reserved|non-reserved|non-reserved|
|DATABASES|non-reserved|non-reserved|non-reserved|
[SPARK-35285][SQL] Parse ANSI interval types in SQL schema ### What changes were proposed in this pull request? 1. Extend Spark SQL parser to support parsing of: - `INTERVAL YEAR TO MONTH` to `YearMonthIntervalType` - `INTERVAL DAY TO SECOND` to `DayTimeIntervalType` 2. Assign new names to the ANSI interval types according to the SQL standard to be able to parse the names back by Spark SQL parser. Override the `typeName()` name of `YearMonthIntervalType`/`DayTimeIntervalType`. ### Why are the changes needed? To be able to use new ANSI interval types in SQL. The SQL standard requires the types to be defined according to the rules: ``` <interval type> ::= INTERVAL <interval qualifier> <interval qualifier> ::= <start field> TO <end field> | <single datetime field> <start field> ::= <non-second primary datetime field> [ <left paren> <interval leading field precision> <right paren> ] <end field> ::= <non-second primary datetime field> | SECOND [ <left paren> <interval fractional seconds precision> <right paren> ] <primary datetime field> ::= <non-second primary datetime field | SECOND <non-second primary datetime field> ::= YEAR | MONTH | DAY | HOUR | MINUTE <interval fractional seconds precision> ::= <unsigned integer> <interval leading field precision> ::= <unsigned integer> ``` Currently, Spark SQL supports only `YEAR TO MONTH` and `DAY TO SECOND` as `<interval qualifier>`. ### Does this PR introduce _any_ user-facing change? Should not since the types has not been released yet. ### How was this patch tested? By running the affected tests such as: ``` $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z datetime.sql" $ build/sbt "test:testOnly *ExpressionTypeCheckingSuite" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z windowFrameCoercion.sql" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z literals.sql" ``` Closes #32409 from MaxGekk/parse-ansi-interval-types. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-05-03 00:50:35 -04:00
|DAY|non-reserved|non-reserved|non-reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|DBPROPERTIES|non-reserved|non-reserved|non-reserved|
|DEFINED|non-reserved|non-reserved|non-reserved|
|DELETE|non-reserved|non-reserved|reserved|
|DELIMITED|non-reserved|non-reserved|non-reserved|
|DESC|non-reserved|non-reserved|non-reserved|
|DESCRIBE|non-reserved|non-reserved|reserved|
|DFS|non-reserved|non-reserved|non-reserved|
|DIRECTORIES|non-reserved|non-reserved|non-reserved|
|DIRECTORY|non-reserved|non-reserved|non-reserved|
|DISTINCT|reserved|non-reserved|reserved|
|DISTRIBUTE|non-reserved|non-reserved|non-reserved|
|DIV|non-reserved|non-reserved|not a keyword|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|DROP|non-reserved|non-reserved|reserved|
|ELSE|reserved|non-reserved|reserved|
|END|reserved|non-reserved|reserved|
|ESCAPE|reserved|non-reserved|reserved|
|ESCAPED|non-reserved|non-reserved|non-reserved|
|EXCEPT|reserved|strict-non-reserved|reserved|
|EXCHANGE|non-reserved|non-reserved|non-reserved|
|EXISTS|non-reserved|non-reserved|reserved|
|EXPLAIN|non-reserved|non-reserved|non-reserved|
|EXPORT|non-reserved|non-reserved|non-reserved|
|EXTENDED|non-reserved|non-reserved|non-reserved|
|EXTERNAL|non-reserved|non-reserved|reserved|
|EXTRACT|non-reserved|non-reserved|reserved|
|FALSE|reserved|non-reserved|reserved|
|FETCH|reserved|non-reserved|reserved|
|FIELDS|non-reserved|non-reserved|non-reserved|
|FILTER|reserved|non-reserved|reserved|
|FILEFORMAT|non-reserved|non-reserved|non-reserved|
|FIRST|non-reserved|non-reserved|non-reserved|
|FOLLOWING|non-reserved|non-reserved|non-reserved|
|FOR|reserved|non-reserved|reserved|
|FOREIGN|reserved|non-reserved|reserved|
|FORMAT|non-reserved|non-reserved|non-reserved|
|FORMATTED|non-reserved|non-reserved|non-reserved|
|FROM|reserved|non-reserved|reserved|
|FULL|reserved|strict-non-reserved|reserved|
|FUNCTION|non-reserved|non-reserved|reserved|
|FUNCTIONS|non-reserved|non-reserved|non-reserved|
|GLOBAL|non-reserved|non-reserved|reserved|
|GRANT|reserved|non-reserved|reserved|
|GROUP|reserved|non-reserved|reserved|
|GROUPING|non-reserved|non-reserved|reserved|
|HAVING|reserved|non-reserved|reserved|
|IF|non-reserved|non-reserved|not a keyword|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|IGNORE|non-reserved|non-reserved|non-reserved|
|IMPORT|non-reserved|non-reserved|non-reserved|
|IN|reserved|non-reserved|reserved|
|INDEX|non-reserved|non-reserved|non-reserved|
|INDEXES|non-reserved|non-reserved|non-reserved|
|INNER|reserved|strict-non-reserved|reserved|
|INPATH|non-reserved|non-reserved|non-reserved|
|INPUTFORMAT|non-reserved|non-reserved|non-reserved|
|INSERT|non-reserved|non-reserved|reserved|
|INTERSECT|reserved|strict-non-reserved|reserved|
|INTERVAL|non-reserved|non-reserved|reserved|
|INTO|reserved|non-reserved|reserved|
|IS|reserved|non-reserved|reserved|
|ITEMS|non-reserved|non-reserved|non-reserved|
|JOIN|reserved|strict-non-reserved|reserved|
|KEYS|non-reserved|non-reserved|non-reserved|
|LAST|non-reserved|non-reserved|non-reserved|
|LATERAL|non-reserved|non-reserved|reserved|
|LAZY|non-reserved|non-reserved|non-reserved|
|LEADING|reserved|non-reserved|reserved|
|LEFT|reserved|strict-non-reserved|reserved|
|LIKE|non-reserved|non-reserved|reserved|
|LIMIT|non-reserved|non-reserved|non-reserved|
|LINES|non-reserved|non-reserved|non-reserved|
|LIST|non-reserved|non-reserved|non-reserved|
|LOAD|non-reserved|non-reserved|non-reserved|
|LOCAL|non-reserved|non-reserved|reserved|
|LOCATION|non-reserved|non-reserved|non-reserved|
|LOCK|non-reserved|non-reserved|non-reserved|
|LOCKS|non-reserved|non-reserved|non-reserved|
|LOGICAL|non-reserved|non-reserved|non-reserved|
|MACRO|non-reserved|non-reserved|non-reserved|
|MAP|non-reserved|non-reserved|non-reserved|
|MATCHED|non-reserved|non-reserved|non-reserved|
|MERGE|non-reserved|non-reserved|non-reserved|
|MINUS|non-reserved|strict-non-reserved|non-reserved|
[SPARK-35285][SQL] Parse ANSI interval types in SQL schema ### What changes were proposed in this pull request? 1. Extend Spark SQL parser to support parsing of: - `INTERVAL YEAR TO MONTH` to `YearMonthIntervalType` - `INTERVAL DAY TO SECOND` to `DayTimeIntervalType` 2. Assign new names to the ANSI interval types according to the SQL standard to be able to parse the names back by Spark SQL parser. Override the `typeName()` name of `YearMonthIntervalType`/`DayTimeIntervalType`. ### Why are the changes needed? To be able to use new ANSI interval types in SQL. The SQL standard requires the types to be defined according to the rules: ``` <interval type> ::= INTERVAL <interval qualifier> <interval qualifier> ::= <start field> TO <end field> | <single datetime field> <start field> ::= <non-second primary datetime field> [ <left paren> <interval leading field precision> <right paren> ] <end field> ::= <non-second primary datetime field> | SECOND [ <left paren> <interval fractional seconds precision> <right paren> ] <primary datetime field> ::= <non-second primary datetime field | SECOND <non-second primary datetime field> ::= YEAR | MONTH | DAY | HOUR | MINUTE <interval fractional seconds precision> ::= <unsigned integer> <interval leading field precision> ::= <unsigned integer> ``` Currently, Spark SQL supports only `YEAR TO MONTH` and `DAY TO SECOND` as `<interval qualifier>`. ### Does this PR introduce _any_ user-facing change? Should not since the types has not been released yet. ### How was this patch tested? By running the affected tests such as: ``` $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z datetime.sql" $ build/sbt "test:testOnly *ExpressionTypeCheckingSuite" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z windowFrameCoercion.sql" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z literals.sql" ``` Closes #32409 from MaxGekk/parse-ansi-interval-types. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-05-03 00:50:35 -04:00
|MONTH|non-reserved|non-reserved|non-reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|MSCK|non-reserved|non-reserved|non-reserved|
|NAMESPACE|non-reserved|non-reserved|non-reserved|
|NAMESPACES|non-reserved|non-reserved|non-reserved|
|NATURAL|reserved|strict-non-reserved|reserved|
|NO|non-reserved|non-reserved|reserved|
|NOT|reserved|non-reserved|reserved|
|NULL|reserved|non-reserved|reserved|
|NULLS|non-reserved|non-reserved|non-reserved|
|OF|non-reserved|non-reserved|reserved|
|ON|reserved|strict-non-reserved|reserved|
|ONLY|reserved|non-reserved|reserved|
|OPTION|non-reserved|non-reserved|non-reserved|
|OPTIONS|non-reserved|non-reserved|non-reserved|
|OR|reserved|non-reserved|reserved|
|ORDER|reserved|non-reserved|reserved|
|OUT|non-reserved|non-reserved|reserved|
|OUTER|reserved|non-reserved|reserved|
|OUTPUTFORMAT|non-reserved|non-reserved|non-reserved|
|OVER|non-reserved|non-reserved|non-reserved|
|OVERLAPS|reserved|non-reserved|reserved|
|OVERLAY|non-reserved|non-reserved|non-reserved|
|OVERWRITE|non-reserved|non-reserved|non-reserved|
|PARTITION|non-reserved|non-reserved|reserved|
|PARTITIONED|non-reserved|non-reserved|non-reserved|
|PARTITIONS|non-reserved|non-reserved|non-reserved|
|PERCENT|non-reserved|non-reserved|non-reserved|
|PIVOT|non-reserved|non-reserved|non-reserved|
|PLACING|non-reserved|non-reserved|non-reserved|
|POSITION|non-reserved|non-reserved|reserved|
|PRECEDING|non-reserved|non-reserved|non-reserved|
|PRIMARY|reserved|non-reserved|reserved|
|PRINCIPALS|non-reserved|non-reserved|non-reserved|
|PROPERTIES|non-reserved|non-reserved|non-reserved|
|PURGE|non-reserved|non-reserved|non-reserved|
|QUERY|non-reserved|non-reserved|non-reserved|
|RANGE|non-reserved|non-reserved|reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|RECORDREADER|non-reserved|non-reserved|non-reserved|
|RECORDWRITER|non-reserved|non-reserved|non-reserved|
|RECOVER|non-reserved|non-reserved|non-reserved|
|REDUCE|non-reserved|non-reserved|non-reserved|
|REFERENCES|reserved|non-reserved|reserved|
|REFRESH|non-reserved|non-reserved|non-reserved|
|REGEXP|non-reserved|non-reserved|not a keyword|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|RENAME|non-reserved|non-reserved|non-reserved|
|REPAIR|non-reserved|non-reserved|non-reserved|
|REPLACE|non-reserved|non-reserved|non-reserved|
|RESET|non-reserved|non-reserved|non-reserved|
[SPARK-30789][SQL] Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE ### What changes were proposed in this pull request? All of `LEAD`/`LAG`/`NTH_VALUE`/`FIRST_VALUE`/`LAST_VALUE` should support IGNORE NULLS | RESPECT NULLS. For example: ``` LEAD (value_expr [, offset ]) [ IGNORE NULLS | RESPECT NULLS ] OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ) ``` ``` LAG (value_expr [, offset ]) [ IGNORE NULLS | RESPECT NULLS ] OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ) ``` ``` NTH_VALUE (expr, offset) [ IGNORE NULLS | RESPECT NULLS ] OVER ( [ PARTITION BY window_partition ] [ ORDER BY window_ordering frame_clause ] ) ``` The mainstream database or engine supports this syntax contains: **Oracle** https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0 **Redshift** https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html **Presto** https://prestodb.io/docs/current/functions/window.html **DB2** https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm **Teradata** https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w **Snowflake** https://docs.snowflake.com/en/sql-reference/functions/lead.html https://docs.snowflake.com/en/sql-reference/functions/lag.html https://docs.snowflake.com/en/sql-reference/functions/nth_value.html https://docs.snowflake.com/en/sql-reference/functions/first_value.html https://docs.snowflake.com/en/sql-reference/functions/last_value.html **Exasol** https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/lead.htm https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/lag.htm https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/nth_value.htm https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/first_value.htm https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/last_value.htm ### Why are the changes needed? Support `(IGNORE | RESPECT) NULLS` for `LEAD`/`LAG`/`NTH_VALUE`/`FIRST_VALUE`/`LAST_VALUE `is very useful. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Jenkins test Closes #30943 from beliefer/SPARK-30789. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-12-30 08:14:31 -05:00
|RESPECT|non-reserved|non-reserved|non-reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|RESTRICT|non-reserved|non-reserved|non-reserved|
|REVOKE|non-reserved|non-reserved|reserved|
|RIGHT|reserved|strict-non-reserved|reserved|
|RLIKE|non-reserved|non-reserved|non-reserved|
|ROLE|non-reserved|non-reserved|non-reserved|
|ROLES|non-reserved|non-reserved|non-reserved|
|ROLLBACK|non-reserved|non-reserved|reserved|
|ROLLUP|non-reserved|non-reserved|reserved|
|ROW|non-reserved|non-reserved|reserved|
|ROWS|non-reserved|non-reserved|reserved|
|SCHEMA|non-reserved|non-reserved|non-reserved|
|SCHEMAS|non-reserved|non-reserved|not a keyword|
[SPARK-35285][SQL] Parse ANSI interval types in SQL schema ### What changes were proposed in this pull request? 1. Extend Spark SQL parser to support parsing of: - `INTERVAL YEAR TO MONTH` to `YearMonthIntervalType` - `INTERVAL DAY TO SECOND` to `DayTimeIntervalType` 2. Assign new names to the ANSI interval types according to the SQL standard to be able to parse the names back by Spark SQL parser. Override the `typeName()` name of `YearMonthIntervalType`/`DayTimeIntervalType`. ### Why are the changes needed? To be able to use new ANSI interval types in SQL. The SQL standard requires the types to be defined according to the rules: ``` <interval type> ::= INTERVAL <interval qualifier> <interval qualifier> ::= <start field> TO <end field> | <single datetime field> <start field> ::= <non-second primary datetime field> [ <left paren> <interval leading field precision> <right paren> ] <end field> ::= <non-second primary datetime field> | SECOND [ <left paren> <interval fractional seconds precision> <right paren> ] <primary datetime field> ::= <non-second primary datetime field | SECOND <non-second primary datetime field> ::= YEAR | MONTH | DAY | HOUR | MINUTE <interval fractional seconds precision> ::= <unsigned integer> <interval leading field precision> ::= <unsigned integer> ``` Currently, Spark SQL supports only `YEAR TO MONTH` and `DAY TO SECOND` as `<interval qualifier>`. ### Does this PR introduce _any_ user-facing change? Should not since the types has not been released yet. ### How was this patch tested? By running the affected tests such as: ``` $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z datetime.sql" $ build/sbt "test:testOnly *ExpressionTypeCheckingSuite" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z windowFrameCoercion.sql" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z literals.sql" ``` Closes #32409 from MaxGekk/parse-ansi-interval-types. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-05-03 00:50:35 -04:00
|SECOND|non-reserved|non-reserved|non-reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|SELECT|reserved|non-reserved|reserved|
|SEMI|non-reserved|strict-non-reserved|non-reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|SEPARATED|non-reserved|non-reserved|non-reserved|
|SERDE|non-reserved|non-reserved|non-reserved|
|SERDEPROPERTIES|non-reserved|non-reserved|non-reserved|
|SESSION_USER|reserved|non-reserved|reserved|
|SET|non-reserved|non-reserved|reserved|
|SETS|non-reserved|non-reserved|non-reserved|
|SHOW|non-reserved|non-reserved|non-reserved|
|SKEWED|non-reserved|non-reserved|non-reserved|
|SOME|reserved|non-reserved|reserved|
|SORT|non-reserved|non-reserved|non-reserved|
|SORTED|non-reserved|non-reserved|non-reserved|
|START|non-reserved|non-reserved|reserved|
|STATISTICS|non-reserved|non-reserved|non-reserved|
|STORED|non-reserved|non-reserved|non-reserved|
|STRATIFY|non-reserved|non-reserved|non-reserved|
|STRUCT|non-reserved|non-reserved|non-reserved|
|SUBSTR|non-reserved|non-reserved|non-reserved|
|SUBSTRING|non-reserved|non-reserved|non-reserved|
[SPARK-31891][SQL] Support `MSCK REPAIR TABLE .. [{ADD|DROP|SYNC} PARTITIONS]` ### What changes were proposed in this pull request? In the PR, I propose to extend the `MSCK REPAIR TABLE` command, and support new options `{ADD|DROP|SYNC} PARTITIONS`. In particular: 1. Extend the logical node `RepairTable`, and add two new flags `enableAddPartitions` and `enableDropPartitions`. 2. Add similar flags to the v1 execution node `AlterTableRecoverPartitionsCommand` 3. Add new method `dropPartitions()` to `AlterTableRecoverPartitionsCommand` which drops partitions from the catalog if their locations in the file system don't exist. 4. Updated public docs about the `MSCK REPAIR TABLE` command: <img width="1037" alt="Screenshot 2021-02-16 at 13 46 39" src="https://user-images.githubusercontent.com/1580697/108052607-7446d280-705d-11eb-8e25-7398254787a4.png"> Closes #31097 ### Why are the changes needed? - The changes allow to recover tables with removed partitions. The example below portraits the problem: ```sql spark-sql> create table tbl2 (col int, part int) partitioned by (part); spark-sql> insert into tbl2 partition (part=1) select 1; spark-sql> insert into tbl2 partition (part=0) select 0; spark-sql> show table extended like 'tbl2' partition (part = 0); default tbl2 false Partition Values: [part=0] Location: file:/Users/maximgekk/proj/apache-spark/spark-warehouse/tbl2/part=0 ... ``` Remove the partition (part = 0) from the filesystem: ``` $ rm -rf /Users/maximgekk/proj/apache-spark/spark-warehouse/tbl2/part=0 ``` Even after recovering, we cannot query the table: ```sql spark-sql> msck repair table tbl2; spark-sql> select * from tbl2; 21/01/08 22:49:13 ERROR SparkSQLDriver: Failed in [select * from tbl2] org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/Users/maximgekk/proj/apache-spark/spark-warehouse/tbl2/part=0 ``` - To have feature parity with Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE) ### Does this PR introduce _any_ user-facing change? Yes. After the changes, we can query recovered table: ```sql spark-sql> msck repair table tbl2 sync partitions; spark-sql> select * from tbl2; 1 1 spark-sql> show partitions tbl2; part=1 ``` ### How was this patch tested? - By running the modified test suite: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *MsckRepairTableParserSuite" $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *PlanResolutionSuite" $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRecoverPartitionsSuite" $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRecoverPartitionsParallelSuite" ``` - Added unified v1 and v2 tests for `MSCK REPAIR TABLE`: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *MsckRepairTableSuite" ``` Closes #31499 from MaxGekk/repair-table-drop-partitions. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-02-23 16:45:15 -05:00
|SYNC|non-reserved|non-reserved|non-reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|TABLE|reserved|non-reserved|reserved|
|TABLES|non-reserved|non-reserved|non-reserved|
|TABLESAMPLE|non-reserved|non-reserved|reserved|
|TBLPROPERTIES|non-reserved|non-reserved|non-reserved|
|TEMP|non-reserved|non-reserved|not a keyword|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|TEMPORARY|non-reserved|non-reserved|non-reserved|
|TERMINATED|non-reserved|non-reserved|non-reserved|
|THEN|reserved|non-reserved|reserved|
|TIME|reserved|non-reserved|reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|TO|reserved|non-reserved|reserved|
|TOUCH|non-reserved|non-reserved|non-reserved|
|TRAILING|reserved|non-reserved|reserved|
|TRANSACTION|non-reserved|non-reserved|non-reserved|
|TRANSACTIONS|non-reserved|non-reserved|non-reserved|
|TRANSFORM|non-reserved|non-reserved|non-reserved|
|TRIM|non-reserved|non-reserved|non-reserved|
|TRUE|non-reserved|non-reserved|reserved|
|TRUNCATE|non-reserved|non-reserved|reserved|
[SPARK-34881][SQL] New SQL Function: TRY_CAST ### What changes were proposed in this pull request? Add a new SQL function `try_cast`. `try_cast` is identical to `AnsiCast` (or `Cast` when `spark.sql.ansi.enabled` is true), except it returns NULL instead of raising an error. This expression has one major difference from `cast` with `spark.sql.ansi.enabled` as true: when the source value can't be stored in the target integral(Byte/Short/Int/Long) type, `try_cast` returns null instead of returning the low order bytes of the source value. Note that the result of `try_cast` is not affected by the configuration `spark.sql.ansi.enabled`. This is learned from Google BigQuery and Snowflake: https://docs.snowflake.com/en/sql-reference/functions/try_cast.html https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#safe_casting ### Why are the changes needed? This is an useful for the following scenarios: 1. When ANSI mode is on, users can choose `try_cast` an alternative way to run SQL without errors for certain operations. 2. When ANSI mode is off, users can use `try_cast` to get a more reasonable result for casting a value to an integral type: when an overflow error happens, `try_cast` returns null while `cast` returns the low order bytes of the source value. ### Does this PR introduce _any_ user-facing change? Yes, adding a new function `try_cast` ### How was this patch tested? Unit tests. Closes #31982 from gengliangwang/tryCast. Authored-by: Gengliang Wang <ltnwgl@gmail.com> Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
2021-03-31 08:47:04 -04:00
|TRY_CAST|non-reserved|non-reserved|non-reserved|
|TYPE|non-reserved|non-reserved|non-reserved|
[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 36 35 PM" src="https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png"> - **sql-ref-datatypes.md (Scala)** <img width="967" alt="Screen Shot 2020-05-01 at 4 37 30 PM" src="https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 39 18 PM" src="https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png"> - **sql-ref-datatypes.md (Java)** <img width="967" alt="Screen Shot 2020-05-01 at 4 41 24 PM" src="https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 41 39 PM" src="https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png"> - **sql-ref-datatypes.md (Python)** <img width="967" alt="Screen Shot 2020-05-01 at 4 43 36 PM" src="https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 43 54 PM" src="https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png"> - **sql-ref-datatypes.md (R)** <img width="967" alt="Screen Shot 2020-05-01 at 4 45 16 PM" src="https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png"> <img width="967" alt="Screen Shot 2020-05-01 at 4 45 36 PM" src="https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png"> - **sql-ref-datatypes.md (SQL)** <img width="967" alt="Screen Shot 2020-05-01 at 4 48 02 PM" src="https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png"> - **sql-ref-syntax-qry-select-tvf.md** <img width="967" alt="Screen Shot 2020-05-01 at 4 49 32 PM" src="https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png"> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-05-05 02:21:14 -04:00
|UNARCHIVE|non-reserved|non-reserved|non-reserved|
|UNBOUNDED|non-reserved|non-reserved|non-reserved|
|UNCACHE|non-reserved|non-reserved|non-reserved|
|UNION|reserved|strict-non-reserved|reserved|
|UNIQUE|reserved|non-reserved|reserved|
|UNKNOWN|reserved|non-reserved|reserved|
|UNLOCK|non-reserved|non-reserved|non-reserved|
|UNSET|non-reserved|non-reserved|non-reserved|
|UPDATE|non-reserved|non-reserved|reserved|
|USE|non-reserved|non-reserved|non-reserved|
|USER|reserved|non-reserved|reserved|
|USING|reserved|strict-non-reserved|reserved|
|VALUES|non-reserved|non-reserved|reserved|
|VIEW|non-reserved|non-reserved|non-reserved|
|VIEWS|non-reserved|non-reserved|non-reserved|
|WHEN|reserved|non-reserved|reserved|
|WHERE|reserved|non-reserved|reserved|
|WINDOW|non-reserved|non-reserved|reserved|
|WITH|reserved|non-reserved|reserved|
[SPARK-35285][SQL] Parse ANSI interval types in SQL schema ### What changes were proposed in this pull request? 1. Extend Spark SQL parser to support parsing of: - `INTERVAL YEAR TO MONTH` to `YearMonthIntervalType` - `INTERVAL DAY TO SECOND` to `DayTimeIntervalType` 2. Assign new names to the ANSI interval types according to the SQL standard to be able to parse the names back by Spark SQL parser. Override the `typeName()` name of `YearMonthIntervalType`/`DayTimeIntervalType`. ### Why are the changes needed? To be able to use new ANSI interval types in SQL. The SQL standard requires the types to be defined according to the rules: ``` <interval type> ::= INTERVAL <interval qualifier> <interval qualifier> ::= <start field> TO <end field> | <single datetime field> <start field> ::= <non-second primary datetime field> [ <left paren> <interval leading field precision> <right paren> ] <end field> ::= <non-second primary datetime field> | SECOND [ <left paren> <interval fractional seconds precision> <right paren> ] <primary datetime field> ::= <non-second primary datetime field | SECOND <non-second primary datetime field> ::= YEAR | MONTH | DAY | HOUR | MINUTE <interval fractional seconds precision> ::= <unsigned integer> <interval leading field precision> ::= <unsigned integer> ``` Currently, Spark SQL supports only `YEAR TO MONTH` and `DAY TO SECOND` as `<interval qualifier>`. ### Does this PR introduce _any_ user-facing change? Should not since the types has not been released yet. ### How was this patch tested? By running the affected tests such as: ``` $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z datetime.sql" $ build/sbt "test:testOnly *ExpressionTypeCheckingSuite" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z windowFrameCoercion.sql" $ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z literals.sql" ``` Closes #32409 from MaxGekk/parse-ansi-interval-types. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-05-03 00:50:35 -04:00
|YEAR|non-reserved|non-reserved|non-reserved|
|ZONE|non-reserved|non-reserved|non-reserved|