[SPARK-35159][SQL][DOCS] Extract hive format doc

### What changes were proposed in this pull request?
Extract common doc about hive format for `sql-ref-syntax-ddl-create-table-hiveformat.md` and `sql-ref-syntax-qry-select-transform.md` to refer.

![image](https://user-images.githubusercontent.com/46485123/115802193-04641800-a411-11eb-827d-d92544881842.png)

### Why are the changes needed?
Improve doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #32264 from AngersZhuuuu/SPARK-35159.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This commit is contained in:
Angerszhuuuu 2021-04-23 05:47:48 +00:00 committed by Wenchen Fan
parent 7582dc86bc
commit 20d68dc2f4
3 changed files with 77 additions and 96 deletions

View file

@ -39,14 +39,6 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
[ LOCATION path ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ AS select_statement ]
row_format:
: SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
| DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
[ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
[ MAP KEYS TERMINATED BY map_key_terminated_char ]
[ LINES TERMINATED BY row_terminated_char ]
[ NULL DEFINED AS null_char ]
```
Note that, the clauses between the columns definition clause and the AS SELECT clause can come in
@ -82,50 +74,10 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
* **INTO num_buckets BUCKETS**
Specifies buckets numbers, which is used in `CLUSTERED BY` clause.
* **row_format**
Use the `SERDE` clause to specify a custom SerDe for one table. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on.
* **SERDE**
* **row_format**
Specifies a custom SerDe for one table.
* **serde_class**
Specifies a fully-qualified class name of a custom SerDe.
* **SERDEPROPERTIES**
A list of key-value pairs that is used to tag the SerDe definition.
* **DELIMITED**
The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on.
* **FIELDS TERMINATED BY**
Used to define a column separator.
* **COLLECTION ITEMS TERMINATED BY**
Used to define a collection item separator.
* **MAP KEYS TERMINATED BY**
Used to define a map key separator.
* **LINES TERMINATED BY**
Used to define a row separator.
* **NULL DEFINED AS**
Used to define the specific value for NULL.
* **ESCAPED BY**
Used for escape mechanism.
Specifies the row format for input and output. See [HIVE FORMAT](sql-ref-syntax-hive-format.html) for more syntax details.
* **STORED AS**

View file

@ -0,0 +1,73 @@
---
layout: global
title: Hive Row Format
displayTitle: Hive Row Format
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
### Description
Spark supports a Hive row format in `CREATE TABLE` and `TRANSFORM` clause to specify serde or text delimiter.
There are two ways to define a row format in `row_format` of `CREATE TABLE` and `TRANSFORM` clauses.
1. `SERDE` clause to specify a custom SerDe class.
2. `DELIMITED` clause to specify a delimiter, an escape character, a null character, and so on for the native SerDe.
### Syntax
```sql
row_format:
SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
| DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
[ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
[ MAP KEYS TERMINATED BY map_key_terminated_char ]
[ LINES TERMINATED BY row_terminated_char ]
[ NULL DEFINED AS null_char ]
```
### Parameters
* **SERDE serde_class**
Specifies a fully-qualified class name of custom SerDe.
* **SERDEPROPERTIES**
A list of key-value pairs that is used to tag the SerDe definition.
* **FIELDS TERMINATED BY**
Used to define a column separator.
* **COLLECTION ITEMS TERMINATED BY**
Used to define a collection item separator.
* **MAP KEYS TERMINATED BY**
Used to define a map key separator.
* **LINES TERMINATED BY**
Used to define a row separator.
* **NULL DEFINED AS**
Used to define the specific value for NULL.
* **ESCAPED BY**
Used for escape mechanism.

View file

@ -33,14 +33,6 @@ SELECT TRANSFORM ( expression [ , ... ] )
USING command_or_script [ AS ( [ col_name [ col_type ] ] [ , ... ] ) ]
[ ROW FORMAT row_format ]
[ RECORDREADER record_reader_class ]
row_format:
SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
| DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
[ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
[ MAP KEYS TERMINATED BY map_key_terminated_char ]
[ LINES TERMINATED BY row_terminated_char ]
[ NULL DEFINED AS null_char ]
```
### Parameters
@ -49,45 +41,9 @@ row_format:
Specifies a combination of one or more values, operators and SQL functions that results in a value.
* **row_format**
* **row_format**
Otherwise, uses the `DELIMITED` clause to specify the native SerDe and state the delimiter, escape character, null character and so on.
* **SERDE**
Specifies a custom SerDe for one table.
* **serde_class**
Specifies a fully-qualified class name of a custom SerDe.
* **DELIMITED**
The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on.
* **FIELDS TERMINATED BY**
Used to define a column separator.
* **COLLECTION ITEMS TERMINATED BY**
Used to define a collection item separator.
* **MAP KEYS TERMINATED BY**
Used to define a map key separator.
* **LINES TERMINATED BY**
Used to define a row separator.
* **NULL DEFINED AS**
Used to define the specific value for NULL.
* **ESCAPED BY**
Used for escape mechanism.
Specifies the row format for input and output. See [HIVE FORMAT](sql-ref-syntax-hive-format.html) for more syntax details.
* **RECORDWRITER**