[SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause
### What changes were proposed in this pull request? Add doc about `TRANSFORM` and related function. ![image](https://user-images.githubusercontent.com/46485123/114332579-1627fe80-9b79-11eb-8fa7-131f0a20f72f.png) ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #31010 from AngersZhuuuu/SPARK-33976. Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com> Co-authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This commit is contained in:
parent
b219e37af3
commit
9c956abb1d
|
@ -192,6 +192,8 @@
|
||||||
url: sql-ref-syntax-qry-select-lateral-view.html
|
url: sql-ref-syntax-qry-select-lateral-view.html
|
||||||
- text: PIVOT Clause
|
- text: PIVOT Clause
|
||||||
url: sql-ref-syntax-qry-select-pivot.html
|
url: sql-ref-syntax-qry-select-pivot.html
|
||||||
|
- text: TRANSFORM Clause
|
||||||
|
url: sql-ref-syntax-qry-select-transform.html
|
||||||
- text: EXPLAIN
|
- text: EXPLAIN
|
||||||
url: sql-ref-syntax-qry-explain.html
|
url: sql-ref-syntax-qry-explain.html
|
||||||
- text: Auxiliary Statements
|
- text: Auxiliary Statements
|
||||||
|
|
235
docs/sql-ref-syntax-qry-select-transform.md
Normal file
235
docs/sql-ref-syntax-qry-select-transform.md
Normal file
|
@ -0,0 +1,235 @@
|
||||||
|
---
|
||||||
|
layout: global
|
||||||
|
title: TRANSFORM
|
||||||
|
displayTitle: TRANSFORM
|
||||||
|
license: |
|
||||||
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||||
|
contributor license agreements. See the NOTICE file distributed with
|
||||||
|
this work for additional information regarding copyright ownership.
|
||||||
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||||
|
(the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License.
|
||||||
|
---
|
||||||
|
|
||||||
|
### Description
|
||||||
|
|
||||||
|
The `TRANSFORM` clause is used to specify a Hive-style transform query specification
|
||||||
|
to transform the inputs by running a user-specified command or script.
|
||||||
|
|
||||||
|
### Syntax
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT TRANSFORM ( expression [ , ... ] )
|
||||||
|
[ ROW FORMAT row_format ]
|
||||||
|
[ RECORDWRITER record_writer_class ]
|
||||||
|
USING command_or_script [ AS ( [ col_name [ col_type ] ] [ , ... ] ) ]
|
||||||
|
[ ROW FORMAT row_format ]
|
||||||
|
[ RECORDREADER record_reader_class ]
|
||||||
|
|
||||||
|
row_format:
|
||||||
|
SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
|
||||||
|
| DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
|
||||||
|
[ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
|
||||||
|
[ MAP KEYS TERMINATED BY map_key_terminated_char ]
|
||||||
|
[ LINES TERMINATED BY row_terminated_char ]
|
||||||
|
[ NULL DEFINED AS null_char ]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Parameters
|
||||||
|
|
||||||
|
* **expression**
|
||||||
|
|
||||||
|
Specifies a combination of one or more values, operators and SQL functions that results in a value.
|
||||||
|
|
||||||
|
* **row_format**
|
||||||
|
|
||||||
|
Otherwise, uses the `DELIMITED` clause to specify the native SerDe and state the delimiter, escape character, null character and so on.
|
||||||
|
|
||||||
|
* **SERDE**
|
||||||
|
|
||||||
|
Specifies a custom SerDe for one table.
|
||||||
|
|
||||||
|
* **serde_class**
|
||||||
|
|
||||||
|
Specifies a fully-qualified class name of a custom SerDe.
|
||||||
|
|
||||||
|
* **DELIMITED**
|
||||||
|
|
||||||
|
The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on.
|
||||||
|
|
||||||
|
* **FIELDS TERMINATED BY**
|
||||||
|
|
||||||
|
Used to define a column separator.
|
||||||
|
|
||||||
|
* **COLLECTION ITEMS TERMINATED BY**
|
||||||
|
|
||||||
|
Used to define a collection item separator.
|
||||||
|
|
||||||
|
* **MAP KEYS TERMINATED BY**
|
||||||
|
|
||||||
|
Used to define a map key separator.
|
||||||
|
|
||||||
|
* **LINES TERMINATED BY**
|
||||||
|
|
||||||
|
Used to define a row separator.
|
||||||
|
|
||||||
|
* **NULL DEFINED AS**
|
||||||
|
|
||||||
|
Used to define the specific value for NULL.
|
||||||
|
|
||||||
|
* **ESCAPED BY**
|
||||||
|
|
||||||
|
Used for escape mechanism.
|
||||||
|
|
||||||
|
* **RECORDWRITER**
|
||||||
|
|
||||||
|
Specifies a fully-qualified class name of a custom RecordWriter. The default value is `org.apache.hadoop.hive.ql.exec.TextRecordWriter`.
|
||||||
|
|
||||||
|
* **RECORDREADER**
|
||||||
|
|
||||||
|
Specifies a fully-qualified class name of a custom RecordReader. The default value is `org.apache.hadoop.hive.ql.exec.TextRecordReader`.
|
||||||
|
|
||||||
|
* **command_or_script**
|
||||||
|
|
||||||
|
Specifies a command or a path to script to process data.
|
||||||
|
|
||||||
|
### SerDe behavior
|
||||||
|
|
||||||
|
Spark uses the Hive SerDe `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe` by default, so columns will be casted
|
||||||
|
to `STRING` and combined by tabs before feeding to the user script. All `NULL` values will be converted
|
||||||
|
to the literal string `"\N"` in order to differentiate `NULL` values from empty strings. The standard output of the
|
||||||
|
user script will be treated as tab-separated `STRING` columns, any cell containing only `"\N"` will be re-interpreted
|
||||||
|
as a `NULL` value, and then the resulting STRING column will be cast to the data type specified in `col_type`. If the actual
|
||||||
|
number of output columns is less than the number of specified output columns, insufficient output columns will be
|
||||||
|
supplemented with `NULL`. If the actual number of output columns is more than the number of specified output columns,
|
||||||
|
the output columns will only select the corresponding columns and the remaining part will be discarded.
|
||||||
|
If there is no `AS` clause after `USING my_script`, an output schema will be `key: STRING, value: STRING`.
|
||||||
|
The `key` column contains all the characters before the first tab and the `value` column contains the remaining characters after the first tab.
|
||||||
|
If there is no enough tab, Spark will return `NULL` value. These defaults can be overridden with `ROW FORMAT SERDE` or `ROW FORMAT DELIMITED`.
|
||||||
|
|
||||||
|
### Examples
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE person (zip_code INT, name STRING, age INT);
|
||||||
|
INSERT INTO person VALUES
|
||||||
|
(94588, 'Zen Hui', 50),
|
||||||
|
(94588, 'Dan Li', 18),
|
||||||
|
(94588, 'Anil K', 27),
|
||||||
|
(94588, 'John V', NULL),
|
||||||
|
(94511, 'David K', 42),
|
||||||
|
(94511, 'Aryan B.', 18),
|
||||||
|
(94511, 'Lalit B.', NULL);
|
||||||
|
|
||||||
|
-- With specified output without data type
|
||||||
|
SELECT TRANSFORM(zip_code, name, age)
|
||||||
|
USING 'cat' AS (a, b, c)
|
||||||
|
FROM person
|
||||||
|
WHERE zip_code > 94511;
|
||||||
|
+-------+---------+-----+
|
||||||
|
| a | b| c|
|
||||||
|
+-------+---------+-----+
|
||||||
|
| 94588| Anil K| 27|
|
||||||
|
| 94588| John V| NULL|
|
||||||
|
| 94588| Zen Hui| 50|
|
||||||
|
| 94588| Dan Li| 18|
|
||||||
|
+-------+---------+-----+
|
||||||
|
|
||||||
|
-- With specified output with data type
|
||||||
|
SELECT TRANSFORM(zip_code, name, age)
|
||||||
|
USING 'cat' AS (a STRING, b STRING, c STRING)
|
||||||
|
FROM person
|
||||||
|
WHERE zip_code > 94511;
|
||||||
|
+-------+---------+-----+
|
||||||
|
| a | b| c|
|
||||||
|
+-------+---------+-----+
|
||||||
|
| 94588| Anil K| 27|
|
||||||
|
| 94588| John V| NULL|
|
||||||
|
| 94588| Zen Hui| 50|
|
||||||
|
| 94588| Dan Li| 18|
|
||||||
|
+-------+---------+-----+
|
||||||
|
|
||||||
|
-- Using ROW FORMAT DELIMITED
|
||||||
|
SELECT TRANSFORM(name, age)
|
||||||
|
ROW FORMAT DELIMITED
|
||||||
|
FIELDS TERMINATED BY ','
|
||||||
|
LINES TERMINATED BY '\n'
|
||||||
|
NULL DEFINED AS 'NULL'
|
||||||
|
USING 'cat' AS (name_age string)
|
||||||
|
ROW FORMAT DELIMITED
|
||||||
|
FIELDS TERMINATED BY '@'
|
||||||
|
LINES TERMINATED BY '\n'
|
||||||
|
NULL DEFINED AS 'NULL'
|
||||||
|
FROM person;
|
||||||
|
+---------------+
|
||||||
|
| name_age|
|
||||||
|
+---------------+
|
||||||
|
| Anil K,27|
|
||||||
|
| John V,null|
|
||||||
|
| ryan B.,18|
|
||||||
|
| David K,42|
|
||||||
|
| Zen Hui,50|
|
||||||
|
| Dan Li,18|
|
||||||
|
| Lalit B.,null|
|
||||||
|
+---------------+
|
||||||
|
|
||||||
|
-- Using Hive Serde
|
||||||
|
SELECT TRANSFORM(zip_code, name, age)
|
||||||
|
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
|
||||||
|
WITH SERDEPROPERTIES (
|
||||||
|
'field.delim' = '\t'
|
||||||
|
)
|
||||||
|
USING 'cat' AS (a STRING, b STRING, c STRING)
|
||||||
|
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
|
||||||
|
WITH SERDEPROPERTIES (
|
||||||
|
'field.delim' = '\t'
|
||||||
|
)
|
||||||
|
FROM person
|
||||||
|
WHERE zip_code > 94511;
|
||||||
|
+-------+---------+-----+
|
||||||
|
| a | b| c|
|
||||||
|
+-------+---------+-----+
|
||||||
|
| 94588| Anil K| 27|
|
||||||
|
| 94588| John V| NULL|
|
||||||
|
| 94588| Zen Hui| 50|
|
||||||
|
| 94588| Dan Li| 18|
|
||||||
|
+-------+---------+-----+
|
||||||
|
|
||||||
|
-- Schema-less mode
|
||||||
|
SELECT TRANSFORM(zip_code, name, age)
|
||||||
|
USING 'cat'
|
||||||
|
FROM person
|
||||||
|
WHERE zip_code > 94500;
|
||||||
|
+-------+---------------------+
|
||||||
|
| key| value|
|
||||||
|
+-------+---------------------+
|
||||||
|
| 94588| Anil K 27|
|
||||||
|
| 94588| John V \N|
|
||||||
|
| 94511| Aryan B. 18|
|
||||||
|
| 94511| David K 42|
|
||||||
|
| 94588| Zen Hui 50|
|
||||||
|
| 94588| Dan Li 18|
|
||||||
|
| 94511| Lalit B. \N|
|
||||||
|
+-------+---------------------+
|
||||||
|
```
|
||||||
|
|
||||||
|
### Related Statements
|
||||||
|
|
||||||
|
* [SELECT Main](sql-ref-syntax-qry-select.html)
|
||||||
|
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||||
|
* [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
||||||
|
* [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
||||||
|
* [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
||||||
|
* [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
||||||
|
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
||||||
|
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
||||||
|
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
|
||||||
|
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
|
||||||
|
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
|
|
@ -41,7 +41,7 @@ select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_stat
|
||||||
|
|
||||||
While `select_statement` is defined as
|
While `select_statement` is defined as
|
||||||
```sql
|
```sql
|
||||||
SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ named_expression | regex_column_names ] [ , ... ] }
|
SELECT [ hints , ... ] [ ALL | DISTINCT ] { [[ named_expression | regex_column_names ] [ , ... ] | TRANSFORM (...)) ] }
|
||||||
FROM { from_item [ , ... ] }
|
FROM { from_item [ , ... ] }
|
||||||
[ PIVOT clause ]
|
[ PIVOT clause ]
|
||||||
[ LATERAL VIEW clause ] [ ... ]
|
[ LATERAL VIEW clause ] [ ... ]
|
||||||
|
@ -164,6 +164,10 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ named_expression | regex_column_na
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
* **TRANSFORM**
|
||||||
|
|
||||||
|
Specifies a hive-style transform query specification to transform the input by forking and running user-specified command or script.
|
||||||
|
|
||||||
### Related Statements
|
### Related Statements
|
||||||
|
|
||||||
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
||||||
|
@ -187,3 +191,4 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ named_expression | regex_column_na
|
||||||
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
|
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
|
||||||
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
|
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
|
||||||
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
|
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
|
||||||
|
* [TRANSFORM Clause](sql-ref-syntax-qry-select-transform.html)
|
||||||
|
|
|
@ -49,4 +49,5 @@ ability to generate logical and physical plan for a given query using
|
||||||
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
|
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
|
||||||
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
|
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
|
||||||
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
|
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
|
||||||
|
* [TRANSFORM Clause](sql-ref-syntax-qry-select-transform.html)
|
||||||
* [EXPLAIN Statement](sql-ref-syntax-qry-explain.html)
|
* [EXPLAIN Statement](sql-ref-syntax-qry-explain.html)
|
||||||
|
|
|
@ -70,6 +70,7 @@ Spark SQL is Apache Spark's module for working with structured data. The SQL Syn
|
||||||
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
|
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
|
||||||
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
|
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
|
||||||
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
|
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
|
||||||
|
* [TRANSFORM Clause](sql-ref-syntax-qry-select-transform.html)
|
||||||
* [EXPLAIN](sql-ref-syntax-qry-explain.html)
|
* [EXPLAIN](sql-ref-syntax-qry-explain.html)
|
||||||
|
|
||||||
### Auxiliary Statements
|
### Auxiliary Statements
|
||||||
|
|
Loading…
Reference in a new issue