[SPARK-35159][SQL][DOCS] Extract hive format doc
### What changes were proposed in this pull request? Extract common doc about hive format for `sql-ref-syntax-ddl-create-table-hiveformat.md` and `sql-ref-syntax-qry-select-transform.md` to refer. ![image](https://user-images.githubusercontent.com/46485123/115802193-04641800-a411-11eb-827d-d92544881842.png) ### Why are the changes needed? Improve doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #32264 from AngersZhuuuu/SPARK-35159. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This commit is contained in:
parent
7582dc86bc
commit
20d68dc2f4
|
@ -39,14 +39,6 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
|
||||||
[ LOCATION path ]
|
[ LOCATION path ]
|
||||||
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
|
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
|
||||||
[ AS select_statement ]
|
[ AS select_statement ]
|
||||||
|
|
||||||
row_format:
|
|
||||||
: SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
|
|
||||||
| DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
|
|
||||||
[ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
|
|
||||||
[ MAP KEYS TERMINATED BY map_key_terminated_char ]
|
|
||||||
[ LINES TERMINATED BY row_terminated_char ]
|
|
||||||
[ NULL DEFINED AS null_char ]
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that, the clauses between the columns definition clause and the AS SELECT clause can come in
|
Note that, the clauses between the columns definition clause and the AS SELECT clause can come in
|
||||||
|
@ -82,50 +74,10 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI
|
||||||
* **INTO num_buckets BUCKETS**
|
* **INTO num_buckets BUCKETS**
|
||||||
|
|
||||||
Specifies buckets numbers, which is used in `CLUSTERED BY` clause.
|
Specifies buckets numbers, which is used in `CLUSTERED BY` clause.
|
||||||
|
|
||||||
* **row_format**
|
|
||||||
|
|
||||||
Use the `SERDE` clause to specify a custom SerDe for one table. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on.
|
* **row_format**
|
||||||
|
|
||||||
* **SERDE**
|
|
||||||
|
|
||||||
Specifies a custom SerDe for one table.
|
Specifies the row format for input and output. See [HIVE FORMAT](sql-ref-syntax-hive-format.html) for more syntax details.
|
||||||
|
|
||||||
* **serde_class**
|
|
||||||
|
|
||||||
Specifies a fully-qualified class name of a custom SerDe.
|
|
||||||
|
|
||||||
* **SERDEPROPERTIES**
|
|
||||||
|
|
||||||
A list of key-value pairs that is used to tag the SerDe definition.
|
|
||||||
|
|
||||||
* **DELIMITED**
|
|
||||||
|
|
||||||
The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on.
|
|
||||||
|
|
||||||
* **FIELDS TERMINATED BY**
|
|
||||||
|
|
||||||
Used to define a column separator.
|
|
||||||
|
|
||||||
* **COLLECTION ITEMS TERMINATED BY**
|
|
||||||
|
|
||||||
Used to define a collection item separator.
|
|
||||||
|
|
||||||
* **MAP KEYS TERMINATED BY**
|
|
||||||
|
|
||||||
Used to define a map key separator.
|
|
||||||
|
|
||||||
* **LINES TERMINATED BY**
|
|
||||||
|
|
||||||
Used to define a row separator.
|
|
||||||
|
|
||||||
* **NULL DEFINED AS**
|
|
||||||
|
|
||||||
Used to define the specific value for NULL.
|
|
||||||
|
|
||||||
* **ESCAPED BY**
|
|
||||||
|
|
||||||
Used for escape mechanism.
|
|
||||||
|
|
||||||
* **STORED AS**
|
* **STORED AS**
|
||||||
|
|
||||||
|
|
73
docs/sql-ref-syntax-hive-format.md
Normal file
73
docs/sql-ref-syntax-hive-format.md
Normal file
|
@ -0,0 +1,73 @@
|
||||||
|
---
|
||||||
|
layout: global
|
||||||
|
title: Hive Row Format
|
||||||
|
displayTitle: Hive Row Format
|
||||||
|
license: |
|
||||||
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||||
|
contributor license agreements. See the NOTICE file distributed with
|
||||||
|
this work for additional information regarding copyright ownership.
|
||||||
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||||
|
(the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License.
|
||||||
|
---
|
||||||
|
|
||||||
|
### Description
|
||||||
|
|
||||||
|
Spark supports a Hive row format in `CREATE TABLE` and `TRANSFORM` clause to specify serde or text delimiter.
|
||||||
|
There are two ways to define a row format in `row_format` of `CREATE TABLE` and `TRANSFORM` clauses.
|
||||||
|
1. `SERDE` clause to specify a custom SerDe class.
|
||||||
|
2. `DELIMITED` clause to specify a delimiter, an escape character, a null character, and so on for the native SerDe.
|
||||||
|
|
||||||
|
### Syntax
|
||||||
|
|
||||||
|
```sql
|
||||||
|
row_format:
|
||||||
|
SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
|
||||||
|
| DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
|
||||||
|
[ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
|
||||||
|
[ MAP KEYS TERMINATED BY map_key_terminated_char ]
|
||||||
|
[ LINES TERMINATED BY row_terminated_char ]
|
||||||
|
[ NULL DEFINED AS null_char ]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Parameters
|
||||||
|
|
||||||
|
* **SERDE serde_class**
|
||||||
|
|
||||||
|
Specifies a fully-qualified class name of custom SerDe.
|
||||||
|
|
||||||
|
* **SERDEPROPERTIES**
|
||||||
|
|
||||||
|
A list of key-value pairs that is used to tag the SerDe definition.
|
||||||
|
|
||||||
|
* **FIELDS TERMINATED BY**
|
||||||
|
|
||||||
|
Used to define a column separator.
|
||||||
|
|
||||||
|
* **COLLECTION ITEMS TERMINATED BY**
|
||||||
|
|
||||||
|
Used to define a collection item separator.
|
||||||
|
|
||||||
|
* **MAP KEYS TERMINATED BY**
|
||||||
|
|
||||||
|
Used to define a map key separator.
|
||||||
|
|
||||||
|
* **LINES TERMINATED BY**
|
||||||
|
|
||||||
|
Used to define a row separator.
|
||||||
|
|
||||||
|
* **NULL DEFINED AS**
|
||||||
|
|
||||||
|
Used to define the specific value for NULL.
|
||||||
|
|
||||||
|
* **ESCAPED BY**
|
||||||
|
|
||||||
|
Used for escape mechanism.
|
|
@ -33,14 +33,6 @@ SELECT TRANSFORM ( expression [ , ... ] )
|
||||||
USING command_or_script [ AS ( [ col_name [ col_type ] ] [ , ... ] ) ]
|
USING command_or_script [ AS ( [ col_name [ col_type ] ] [ , ... ] ) ]
|
||||||
[ ROW FORMAT row_format ]
|
[ ROW FORMAT row_format ]
|
||||||
[ RECORDREADER record_reader_class ]
|
[ RECORDREADER record_reader_class ]
|
||||||
|
|
||||||
row_format:
|
|
||||||
SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
|
|
||||||
| DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
|
|
||||||
[ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
|
|
||||||
[ MAP KEYS TERMINATED BY map_key_terminated_char ]
|
|
||||||
[ LINES TERMINATED BY row_terminated_char ]
|
|
||||||
[ NULL DEFINED AS null_char ]
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Parameters
|
### Parameters
|
||||||
|
@ -49,45 +41,9 @@ row_format:
|
||||||
|
|
||||||
Specifies a combination of one or more values, operators and SQL functions that results in a value.
|
Specifies a combination of one or more values, operators and SQL functions that results in a value.
|
||||||
|
|
||||||
* **row_format**
|
* **row_format**
|
||||||
|
|
||||||
Otherwise, uses the `DELIMITED` clause to specify the native SerDe and state the delimiter, escape character, null character and so on.
|
Specifies the row format for input and output. See [HIVE FORMAT](sql-ref-syntax-hive-format.html) for more syntax details.
|
||||||
|
|
||||||
* **SERDE**
|
|
||||||
|
|
||||||
Specifies a custom SerDe for one table.
|
|
||||||
|
|
||||||
* **serde_class**
|
|
||||||
|
|
||||||
Specifies a fully-qualified class name of a custom SerDe.
|
|
||||||
|
|
||||||
* **DELIMITED**
|
|
||||||
|
|
||||||
The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on.
|
|
||||||
|
|
||||||
* **FIELDS TERMINATED BY**
|
|
||||||
|
|
||||||
Used to define a column separator.
|
|
||||||
|
|
||||||
* **COLLECTION ITEMS TERMINATED BY**
|
|
||||||
|
|
||||||
Used to define a collection item separator.
|
|
||||||
|
|
||||||
* **MAP KEYS TERMINATED BY**
|
|
||||||
|
|
||||||
Used to define a map key separator.
|
|
||||||
|
|
||||||
* **LINES TERMINATED BY**
|
|
||||||
|
|
||||||
Used to define a row separator.
|
|
||||||
|
|
||||||
* **NULL DEFINED AS**
|
|
||||||
|
|
||||||
Used to define the specific value for NULL.
|
|
||||||
|
|
||||||
* **ESCAPED BY**
|
|
||||||
|
|
||||||
Used for escape mechanism.
|
|
||||||
|
|
||||||
* **RECORDWRITER**
|
* **RECORDWRITER**
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue