92c1b24617
### What changes were proposed in this pull request? Document Common Table Expression in SQL Reference ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1050" alt="Screen Shot 2020-04-13 at 12 06 35 AM" src="https://user-images.githubusercontent.com/13592258/79100257-f61def00-7d1a-11ea-8402-17017059232e.png"> <img width="1050" alt="Screen Shot 2020-04-13 at 12 07 09 AM" src="https://user-images.githubusercontent.com/13592258/79100260-f7e7b280-7d1a-11ea-9408-058c0851f0b6.png"> <img width="1050" alt="Screen Shot 2020-04-13 at 12 07 35 AM" src="https://user-images.githubusercontent.com/13592258/79100262-fa4a0c80-7d1a-11ea-8862-eb1d8960296b.png"> Also link to Select page <img width="1045" alt="Screen Shot 2020-04-12 at 4 14 30 PM" src="https://user-images.githubusercontent.com/13592258/79082246-217fea00-7cd9-11ea-8d96-1a69769d1e19.png"> ### How was this patch tested? Manually build and check Closes #28196 from huaxingao/cte. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
163 lines
6.9 KiB
Markdown
163 lines
6.9 KiB
Markdown
---
|
|
layout: global
|
|
title: SELECT
|
|
displayTitle: SELECT
|
|
license: |
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
---
|
|
|
|
### Description
|
|
|
|
Spark supports a `SELECT` statement and conforms to the ANSI SQL standard. Queries are
|
|
used to retrieve result sets from one or more tables. The following section
|
|
describes the overall query syntax and the sub-sections cover different constructs
|
|
of a query along with examples.
|
|
|
|
### Syntax
|
|
|
|
{% highlight sql %}
|
|
[ WITH with_query [ , ... ] ]
|
|
select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_statement, ... ]
|
|
[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
|
|
[ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
|
|
[ CLUSTER BY { expression [ , ...] } ]
|
|
[ DISTRIBUTE BY { expression [, ...] } ]
|
|
[ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
|
|
[ LIMIT { ALL | expression } ]
|
|
{% endhighlight %}
|
|
|
|
While `select_statement` is defined as
|
|
{% highlight sql %}
|
|
SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
|
|
FROM { from_item [ , ...] }
|
|
[ WHERE boolean_expression ]
|
|
[ GROUP BY expression [ , ...] ]
|
|
[ HAVING boolean_expression ]
|
|
{% endhighlight %}
|
|
|
|
### Parameters
|
|
|
|
<dl>
|
|
<dt><code><em>with_query</em></code></dt>
|
|
<dd>
|
|
Specifies the <a href="sql-ref-syntax-qry-select-cte.html">common table expressions (CTEs)</a> before the main query block.
|
|
These table expressions are allowed to be referenced later in the FROM clause. This is useful to abstract
|
|
out repeated subquery blocks in the FROM clause and improves readability of the query.
|
|
</dd>
|
|
<dt><code><em>hints</em></code></dt>
|
|
<dd>
|
|
Hints can be specified to help spark optimizer make better planning decisions. Currently spark supports hints
|
|
that influence selection of join strategies and repartitioning of the data.
|
|
</dd>
|
|
<dt><code><em>ALL</em></code></dt>
|
|
<dd>
|
|
Select all matching rows from the relation and is enabled by default.
|
|
</dd>
|
|
<dt><code><em>DISTINCT</em></code></dt>
|
|
<dd>
|
|
Select all matching rows from the relation after removing duplicates in results.
|
|
</dd>
|
|
<dt><code><em>named_expression</em></code></dt>
|
|
<dd>
|
|
An expression with an assigned name. In general, it denotes a column expression.<br><br>
|
|
<b>Syntax:</b>
|
|
<code>
|
|
expression [AS] [alias]
|
|
</code>
|
|
</dd>
|
|
<dt><code><em>from_item</em></code></dt>
|
|
<dd>
|
|
Specifies a source of input for the query. It can be one of the following:
|
|
<ol>
|
|
<li>Table relation</li>
|
|
<li><a href="sql-ref-syntax-qry-select-join.html">Join relation</a></li>
|
|
<li><a href="sql-ref-syntax-qry-select-tvf.html">Table-value function</a></li>
|
|
<li><a href="sql-ref-syntax-qry-select-inline-table.html">Inline table</a></li>
|
|
<li>Subquery</li>
|
|
</ol>
|
|
</dd>
|
|
<dt><code><em>WHERE</em></code></dt>
|
|
<dd>
|
|
Filters the result of the FROM clause based on the supplied predicates.
|
|
</dd>
|
|
<dt><code><em>GROUP BY</em></code></dt>
|
|
<dd>
|
|
Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions
|
|
(MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group.
|
|
When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
|
|
</dd>
|
|
<dt><code><em>HAVING</em></code></dt>
|
|
<dd>
|
|
Specifies the predicates by which the rows produced by GROUP BY are filtered. The HAVING clause is used to
|
|
filter rows after the grouping is performed. If HAVING is specified without GROUP BY, it indicates a GROUP BY
|
|
without grouping expressions (global aggregate).
|
|
</dd>
|
|
<dt><code><em>ORDER BY</em></code></dt>
|
|
<dd>
|
|
Specifies an ordering of the rows of the complete result set of the query. The output rows are ordered
|
|
across the partitions. This parameter is mutually exclusive with <code>SORT BY</code>,
|
|
<code>CLUSTER BY</code> and <code>DISTRIBUTE BY</code> and can not be specified together.
|
|
</dd>
|
|
<dt><code><em>SORT BY</em></code></dt>
|
|
<dd>
|
|
Specifies an ordering by which the rows are ordered within each partition. This parameter is mutually
|
|
exclusive with <code>ORDER BY</code> and <code>CLUSTER BY</code> and can not be specified together.
|
|
</dd>
|
|
<dt><code><em>CLUSTER BY</em></code></dt>
|
|
<dd>
|
|
Specifies a set of expressions that is used to repartition and sort the rows. Using this clause has
|
|
the same effect of using <code>DISTRIBUTE BY</code> and <code>SORT BY</code> together.
|
|
</dd>
|
|
<dt><code><em>DISTRIBUTE BY</em></code></dt>
|
|
<dd>
|
|
Specifies a set of expressions by which the result rows are repartitioned. This parameter is mutually
|
|
exclusive with <code>ORDER BY</code> and <code>CLUSTER BY</code> and can not be specified together.
|
|
</dd>
|
|
<dt><code><em>LIMIT</em></code></dt>
|
|
<dd>
|
|
Specifies the maximum number of rows that can be returned by a statement or subquery. This clause
|
|
is mostly used in the conjunction with <code>ORDER BY</code> to produce a deterministic result.
|
|
</dd>
|
|
<dt><code><em>boolean_expression</em></code></dt>
|
|
<dd>
|
|
Specifies an expression with a return type of boolean.
|
|
</dd>
|
|
<dt><code><em>expression</em></code></dt>
|
|
<dd>
|
|
Specifies a combination of one or more values, operators, and SQL functions that evaluates to a value.
|
|
</dd>
|
|
<dt><code><em>named_window</em></code></dt>
|
|
<dd>
|
|
Specifies aliases for one or more source window specifications. The source window specifications can
|
|
be referenced in the widow definitions in the query.
|
|
</dd>
|
|
</dl>
|
|
|
|
### Related Statements
|
|
|
|
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
|
* [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
|
* [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
|
* [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
|
* [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
|
* [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
|
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
|
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
|
* [TABLESAMPLE](sql-ref-syntax-qry-sampling.html)
|
|
* [JOIN](sql-ref-syntax-qry-select-join.html)
|
|
* [SET Operators](sql-ref-syntax-qry-select-setops.html)
|
|
* [Common Table Expression](sql-ref-syntax-qry-select-cte.html)
|