ODIn/spark-instrumented-optimizer

Fork 0

Commit graph

Author	SHA1	Message	Date
Takeshi Yamamuro	bacffb8810	[SPARK-23264][SQL] Make INTERVAL keyword optional in INTERVAL clauses when ANSI mode enabled ## What changes were proposed in this pull request? This pr updated parsing rules in `SqlBase.g4` to support a SQL query below when ANSI mode enabled; ``` SELECT CAST('2017-08-04' AS DATE) + 1 days; ``` The current master cannot parse it though, other dbms-like systems support the syntax (e.g., hive and mysql). Also, the syntax is frequently used in the official TPC-DS queries. This pr added new tokens as follows; ``` YEAR \| YEARS \| MONTH \| MONTHS \| WEEK \| WEEKS \| DAY \| DAYS \| HOUR \| HOURS \| MINUTE MINUTES \| SECOND \| SECONDS \| MILLISECOND \| MILLISECONDS \| MICROSECOND \| MICROSECONDS ``` Then, it registered the keywords below as the ANSI reserved (this follows SQL-2011); ``` DAY \| HOUR \| MINUTE \| MONTH \| SECOND \| YEAR ``` ## How was this patch tested? Added tests in `SQLQuerySuite`, `ExpressionParserSuite`, and `TableIdentifierParserSuite`. Closes #20433 from maropu/SPARK-23264. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2019-03-14 10:45:29 +09:00
Dilip Biswal	4a486d6716	[SPARK-26982][SQL] Enhance describe framework to describe the output of a query. ## What changes were proposed in this pull request? Currently we can use `df.printSchema` to discover the schema information for a query. We should have a way to describe the output schema of a query using SQL interface. Example: DESCRIBE SELECT * FROM desc_table DESCRIBE QUERY SELECT * FROM desc_table ```SQL spark-sql> create table desc_table (c1 int comment 'c1-comment', c2 decimal comment 'c2-comment', c3 string); spark-sql> desc select * from desc_table; c1 int c1-comment c2 decimal(10,0) c2-comment c3 string NULL ``` ## How was this patch tested? Added a new test under SQLQueryTestSuite and SparkSqlParserSuite Closes #23883 from dilipbiswal/dkb_describe_query. Authored-by: Dilip Biswal <dbiswal@us.ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-03-02 11:21:23 +08:00
Takeshi Yamamuro	967e4cb011	[SPARK-26215][SQL] Define reserved/non-reserved keywords based on the ANSI SQL standard ## What changes were proposed in this pull request? This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in `docs/sql-reserved-and-non-reserved-key-words.md`. NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities. References: - The reserved/non-reserved SQL keywords in the ANSI SQL standards: https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf - SQL Key Words in PostgreSQL: https://www.postgresql.org/docs/current/sql-keywords-appendix.html ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. Closes #23259 from maropu/SPARK-26215-WIP. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2019-02-23 08:38:47 +09:00

Author

SHA1

Message

Date

Takeshi Yamamuro

bacffb8810

[SPARK-23264][SQL] Make INTERVAL keyword optional in INTERVAL clauses when ANSI mode enabled

## What changes were proposed in this pull request?
This pr updated parsing rules in `SqlBase.g4` to support a SQL query below when ANSI mode enabled;
```
SELECT CAST('2017-08-04' AS DATE) + 1 days;
```
The current master cannot parse it though, other dbms-like systems support the syntax (e.g., hive and mysql). Also, the syntax is frequently used in the official TPC-DS queries.

This pr added new tokens as follows;
```
YEAR | YEARS | MONTH | MONTHS | WEEK | WEEKS | DAY | DAYS | HOUR | HOURS | MINUTE
MINUTES | SECOND | SECONDS | MILLISECOND | MILLISECONDS | MICROSECOND | MICROSECONDS
```
Then, it registered the keywords below as the ANSI reserved (this follows SQL-2011);
```
 DAY | HOUR | MINUTE | MONTH | SECOND | YEAR
```

## How was this patch tested?
Added tests in `SQLQuerySuite`, `ExpressionParserSuite`, and `TableIdentifierParserSuite`.

Closes #20433 from maropu/SPARK-23264.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

2019-03-14 10:45:29 +09:00

Dilip Biswal

4a486d6716

[SPARK-26982][SQL] Enhance describe framework to describe the output of a query.

## What changes were proposed in this pull request?
Currently we can use `df.printSchema` to discover the schema information for a query. We should have a way to describe the output schema of a query using SQL interface.

Example:

DESCRIBE SELECT * FROM desc_table
DESCRIBE QUERY SELECT * FROM desc_table
```SQL

spark-sql> create table desc_table (c1 int comment 'c1-comment', c2 decimal comment 'c2-comment', c3 string);

spark-sql> desc select * from desc_table;
c1	int	        c1-comment
c2	decimal(10,0)	c2-comment
c3	string	        NULL

```
## How was this patch tested?
Added a new test under SQLQueryTestSuite and SparkSqlParserSuite

Closes #23883 from dilipbiswal/dkb_describe_query.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

2019-03-02 11:21:23 +08:00

Takeshi Yamamuro

967e4cb011

[SPARK-26215][SQL] Define reserved/non-reserved keywords based on the ANSI SQL standard

## What changes were proposed in this pull request?
This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in `docs/sql-reserved-and-non-reserved-key-words.md`.

NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities.

References:
 - The reserved/non-reserved SQL keywords in the ANSI SQL standards: https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf
 - SQL Key Words in PostgreSQL: https://www.postgresql.org/docs/current/sql-keywords-appendix.html

## How was this patch tested?
Added tests in `TableIdentifierParserSuite`.

Closes #23259 from maropu/SPARK-26215-WIP.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

2019-02-23 08:38:47 +09:00

3 commits