[SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs

### What changes were proposed in this pull request?
update sql-ref docs, the following key words will be added in this PR.

CLUSTERED BY
SORTED BY
INTO num_buckets BUCKETS

### Why are the changes needed?
let more users know the sql key words usage

### Does this PR introduce _any_ user-facing change?
No
![image](https://user-images.githubusercontent.com/46367746/94428281-0a6b8080-01c3-11eb-9ff3-899f8da602ca.png)
![image](https://user-images.githubusercontent.com/46367746/94428285-0d667100-01c3-11eb-8a54-90e7641d917b.png)
![image](https://user-images.githubusercontent.com/46367746/94428288-0f303480-01c3-11eb-9e1d-023538aa6e2d.png)

### How was this patch tested?
generate html test

Closes #29883 from GuoPhilipse/add-sql-missing-keywords.

Lead-authored-by: GuoPhilipse <46367746+GuoPhilipse@users.noreply.github.com>
Co-authored-by: GuoPhilipse <guofei_ok@126.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
This commit is contained in:
GuoPhilipse 2020-10-01 08:15:53 +09:00 committed by Takeshi Yamamuro
parent ece8d8e22c
commit 3bdbb5546d
2 changed files with 38 additions and 1 deletions

View file

@ -67,7 +67,12 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
* **SORTED BY**
Determines the order in which the data is stored in buckets. Default is Ascending order.
Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause.
If not specified, ASC is assumed by default.
* **INTO num_buckets BUCKETS**
Specifies buckets numbers, which is used in `CLUSTERED BY` clause.
* **LOCATION**

View file

@ -31,6 +31,9 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
[ COMMENT table_comment ]
[ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
| ( col_name1, col_name2, ... ) ]
[ CLUSTERED BY ( col_name1, col_name2, ...)
[ SORTED BY ( col_name1 [ ASC | DESC ], col_name2 [ ASC | DESC ], ... ) ]
INTO num_buckets BUCKETS ]
[ ROW FORMAT row_format ]
[ STORED AS file_format ]
[ LOCATION path ]
@ -65,6 +68,21 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
Partitions are created on the table, based on the columns specified.
* **CLUSTERED BY**
Partitions created on the table will be bucketed into fixed buckets based on the column specified for bucketing.
**NOTE:** Bucketing is an optimization technique that uses buckets (and bucketing columns) to determine data partitioning and avoid data shuffle.
* **SORTED BY**
Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause.
If not specified, ASC is assumed by default.
* **INTO num_buckets BUCKETS**
Specifies buckets numbers, which is used in `CLUSTERED BY` clause.
* **row_format**
Use the `SERDE` clause to specify a custom SerDe for one table. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on.
@ -203,6 +221,20 @@ CREATE EXTERNAL TABLE family (id INT, name STRING)
STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
LOCATION '/tmp/family/';
--Use `CLUSTERED BY` clause to create bucket table without `SORTED BY`
CREATE TABLE clustered_by_test1 (ID INT, AGE STRING)
CLUSTERED BY (ID)
INTO 4 BUCKETS
STORED AS ORC
--Use `CLUSTERED BY` clause to create bucket table with `SORTED BY`
CREATE TABLE clustered_by_test2 (ID INT, NAME STRING)
PARTITIONED BY (YEAR STRING)
CLUSTERED BY (ID, NAME)
SORTED BY (ID ASC)
INTO 3 BUCKETS
STORED AS PARQUET
```
### Related Statements