From 20d68dc2f47ea592ecd9ef708167a69bb1c5c47b Mon Sep 17 00:00:00 2001 From: Angerszhuuuu Date: Fri, 23 Apr 2021 05:47:48 +0000 Subject: [PATCH] [SPARK-35159][SQL][DOCS] Extract hive format doc ### What changes were proposed in this pull request? Extract common doc about hive format for `sql-ref-syntax-ddl-create-table-hiveformat.md` and `sql-ref-syntax-qry-select-transform.md` to refer. ![image](https://user-images.githubusercontent.com/46485123/115802193-04641800-a411-11eb-827d-d92544881842.png) ### Why are the changes needed? Improve doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #32264 from AngersZhuuuu/SPARK-35159. Authored-by: Angerszhuuuu Signed-off-by: Wenchen Fan --- ...-ref-syntax-ddl-create-table-hiveformat.md | 52 +------------ docs/sql-ref-syntax-hive-format.md | 73 +++++++++++++++++++ docs/sql-ref-syntax-qry-select-transform.md | 48 +----------- 3 files changed, 77 insertions(+), 96 deletions(-) create mode 100644 docs/sql-ref-syntax-hive-format.md diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 11ec2f1d9e..b2f5957416 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -39,14 +39,6 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ LOCATION path ] [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] [ AS select_statement ] - -row_format: - : SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ] - | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] - [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] - [ MAP KEYS TERMINATED BY map_key_terminated_char ] - [ LINES TERMINATED BY row_terminated_char ] - [ NULL DEFINED AS null_char ] ``` Note that, the clauses between the columns definition clause and the AS SELECT clause can come in @@ -82,50 +74,10 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI * **INTO num_buckets BUCKETS** Specifies buckets numbers, which is used in `CLUSTERED BY` clause. - -* **row_format** - Use the `SERDE` clause to specify a custom SerDe for one table. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on. - -* **SERDE** +* **row_format** - Specifies a custom SerDe for one table. - -* **serde_class** - - Specifies a fully-qualified class name of a custom SerDe. - -* **SERDEPROPERTIES** - - A list of key-value pairs that is used to tag the SerDe definition. - -* **DELIMITED** - - The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on. - -* **FIELDS TERMINATED BY** - - Used to define a column separator. - -* **COLLECTION ITEMS TERMINATED BY** - - Used to define a collection item separator. - -* **MAP KEYS TERMINATED BY** - - Used to define a map key separator. - -* **LINES TERMINATED BY** - - Used to define a row separator. - -* **NULL DEFINED AS** - - Used to define the specific value for NULL. - -* **ESCAPED BY** - - Used for escape mechanism. + Specifies the row format for input and output. See [HIVE FORMAT](sql-ref-syntax-hive-format.html) for more syntax details. * **STORED AS** diff --git a/docs/sql-ref-syntax-hive-format.md b/docs/sql-ref-syntax-hive-format.md new file mode 100644 index 0000000000..8092e582d9 --- /dev/null +++ b/docs/sql-ref-syntax-hive-format.md @@ -0,0 +1,73 @@ +--- +layout: global +title: Hive Row Format +displayTitle: Hive Row Format +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +### Description + +Spark supports a Hive row format in `CREATE TABLE` and `TRANSFORM` clause to specify serde or text delimiter. +There are two ways to define a row format in `row_format` of `CREATE TABLE` and `TRANSFORM` clauses. + 1. `SERDE` clause to specify a custom SerDe class. + 2. `DELIMITED` clause to specify a delimiter, an escape character, a null character, and so on for the native SerDe. + +### Syntax + +```sql +row_format: + SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ] + | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] + [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] + [ MAP KEYS TERMINATED BY map_key_terminated_char ] + [ LINES TERMINATED BY row_terminated_char ] + [ NULL DEFINED AS null_char ] +``` + +### Parameters + +* **SERDE serde_class** + + Specifies a fully-qualified class name of custom SerDe. + +* **SERDEPROPERTIES** + + A list of key-value pairs that is used to tag the SerDe definition. + +* **FIELDS TERMINATED BY** + + Used to define a column separator. + +* **COLLECTION ITEMS TERMINATED BY** + + Used to define a collection item separator. + +* **MAP KEYS TERMINATED BY** + + Used to define a map key separator. + +* **LINES TERMINATED BY** + + Used to define a row separator. + +* **NULL DEFINED AS** + + Used to define the specific value for NULL. + +* **ESCAPED BY** + + Used for escape mechanism. diff --git a/docs/sql-ref-syntax-qry-select-transform.md b/docs/sql-ref-syntax-qry-select-transform.md index 814bd01ec2..21966f2e1c 100644 --- a/docs/sql-ref-syntax-qry-select-transform.md +++ b/docs/sql-ref-syntax-qry-select-transform.md @@ -33,14 +33,6 @@ SELECT TRANSFORM ( expression [ , ... ] ) USING command_or_script [ AS ( [ col_name [ col_type ] ] [ , ... ] ) ] [ ROW FORMAT row_format ] [ RECORDREADER record_reader_class ] - -row_format: - SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ] - | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] - [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] - [ MAP KEYS TERMINATED BY map_key_terminated_char ] - [ LINES TERMINATED BY row_terminated_char ] - [ NULL DEFINED AS null_char ] ``` ### Parameters @@ -49,45 +41,9 @@ row_format: Specifies a combination of one or more values, operators and SQL functions that results in a value. -* **row_format** +* **row_format** - Otherwise, uses the `DELIMITED` clause to specify the native SerDe and state the delimiter, escape character, null character and so on. - -* **SERDE** - - Specifies a custom SerDe for one table. - -* **serde_class** - - Specifies a fully-qualified class name of a custom SerDe. - -* **DELIMITED** - - The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on. - -* **FIELDS TERMINATED BY** - - Used to define a column separator. - -* **COLLECTION ITEMS TERMINATED BY** - - Used to define a collection item separator. - -* **MAP KEYS TERMINATED BY** - - Used to define a map key separator. - -* **LINES TERMINATED BY** - - Used to define a row separator. - -* **NULL DEFINED AS** - - Used to define the specific value for NULL. - -* **ESCAPED BY** - - Used for escape mechanism. + Specifies the row format for input and output. See [HIVE FORMAT](sql-ref-syntax-hive-format.html) for more syntax details. * **RECORDWRITER**