[SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs
### What changes were proposed in this pull request? update sql-ref docs, the following key words will be added in this PR. CLUSTERED BY SORTED BY INTO num_buckets BUCKETS ### Why are the changes needed? let more users know the sql key words usage ### Does this PR introduce _any_ user-facing change? No ![image](https://user-images.githubusercontent.com/46367746/94428281-0a6b8080-01c3-11eb-9ff3-899f8da602ca.png) ![image](https://user-images.githubusercontent.com/46367746/94428285-0d667100-01c3-11eb-8a54-90e7641d917b.png) ![image](https://user-images.githubusercontent.com/46367746/94428288-0f303480-01c3-11eb-9e1d-023538aa6e2d.png) ### How was this patch tested? generate html test Closes #29883 from GuoPhilipse/add-sql-missing-keywords. Lead-authored-by: GuoPhilipse <46367746+GuoPhilipse@users.noreply.github.com> Co-authored-by: GuoPhilipse <guofei_ok@126.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
This commit is contained in:
parent
ece8d8e22c
commit
3bdbb5546d
|
@ -67,7 +67,12 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
|
||||||
|
|
||||||
* **SORTED BY**
|
* **SORTED BY**
|
||||||
|
|
||||||
Determines the order in which the data is stored in buckets. Default is Ascending order.
|
Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause.
|
||||||
|
If not specified, ASC is assumed by default.
|
||||||
|
|
||||||
|
* **INTO num_buckets BUCKETS**
|
||||||
|
|
||||||
|
Specifies buckets numbers, which is used in `CLUSTERED BY` clause.
|
||||||
|
|
||||||
* **LOCATION**
|
* **LOCATION**
|
||||||
|
|
||||||
|
|
|
@ -31,6 +31,9 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
|
||||||
[ COMMENT table_comment ]
|
[ COMMENT table_comment ]
|
||||||
[ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
|
[ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
|
||||||
| ( col_name1, col_name2, ... ) ]
|
| ( col_name1, col_name2, ... ) ]
|
||||||
|
[ CLUSTERED BY ( col_name1, col_name2, ...)
|
||||||
|
[ SORTED BY ( col_name1 [ ASC | DESC ], col_name2 [ ASC | DESC ], ... ) ]
|
||||||
|
INTO num_buckets BUCKETS ]
|
||||||
[ ROW FORMAT row_format ]
|
[ ROW FORMAT row_format ]
|
||||||
[ STORED AS file_format ]
|
[ STORED AS file_format ]
|
||||||
[ LOCATION path ]
|
[ LOCATION path ]
|
||||||
|
@ -65,6 +68,21 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
|
||||||
|
|
||||||
Partitions are created on the table, based on the columns specified.
|
Partitions are created on the table, based on the columns specified.
|
||||||
|
|
||||||
|
* **CLUSTERED BY**
|
||||||
|
|
||||||
|
Partitions created on the table will be bucketed into fixed buckets based on the column specified for bucketing.
|
||||||
|
|
||||||
|
**NOTE:** Bucketing is an optimization technique that uses buckets (and bucketing columns) to determine data partitioning and avoid data shuffle.
|
||||||
|
|
||||||
|
* **SORTED BY**
|
||||||
|
|
||||||
|
Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause.
|
||||||
|
If not specified, ASC is assumed by default.
|
||||||
|
|
||||||
|
* **INTO num_buckets BUCKETS**
|
||||||
|
|
||||||
|
Specifies buckets numbers, which is used in `CLUSTERED BY` clause.
|
||||||
|
|
||||||
* **row_format**
|
* **row_format**
|
||||||
|
|
||||||
Use the `SERDE` clause to specify a custom SerDe for one table. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on.
|
Use the `SERDE` clause to specify a custom SerDe for one table. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on.
|
||||||
|
@ -203,6 +221,20 @@ CREATE EXTERNAL TABLE family (id INT, name STRING)
|
||||||
STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
|
STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
|
||||||
OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
|
OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
|
||||||
LOCATION '/tmp/family/';
|
LOCATION '/tmp/family/';
|
||||||
|
|
||||||
|
--Use `CLUSTERED BY` clause to create bucket table without `SORTED BY`
|
||||||
|
CREATE TABLE clustered_by_test1 (ID INT, AGE STRING)
|
||||||
|
CLUSTERED BY (ID)
|
||||||
|
INTO 4 BUCKETS
|
||||||
|
STORED AS ORC
|
||||||
|
|
||||||
|
--Use `CLUSTERED BY` clause to create bucket table with `SORTED BY`
|
||||||
|
CREATE TABLE clustered_by_test2 (ID INT, NAME STRING)
|
||||||
|
PARTITIONED BY (YEAR STRING)
|
||||||
|
CLUSTERED BY (ID, NAME)
|
||||||
|
SORTED BY (ID ASC)
|
||||||
|
INTO 3 BUCKETS
|
||||||
|
STORED AS PARQUET
|
||||||
```
|
```
|
||||||
|
|
||||||
### Related Statements
|
### Related Statements
|
||||||
|
|
Loading…
Reference in a new issue