spark-instrumented-optimizer/docs/sql-ref-syntax-qry-select.md
Huaxin Gao 92c1b24617 [SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference
### What changes were proposed in this pull request?
Document Common Table Expression in SQL Reference

### Why are the changes needed?
Make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes
<img width="1050" alt="Screen Shot 2020-04-13 at 12 06 35 AM" src="https://user-images.githubusercontent.com/13592258/79100257-f61def00-7d1a-11ea-8402-17017059232e.png">

<img width="1050" alt="Screen Shot 2020-04-13 at 12 07 09 AM" src="https://user-images.githubusercontent.com/13592258/79100260-f7e7b280-7d1a-11ea-9408-058c0851f0b6.png">

<img width="1050" alt="Screen Shot 2020-04-13 at 12 07 35 AM" src="https://user-images.githubusercontent.com/13592258/79100262-fa4a0c80-7d1a-11ea-8862-eb1d8960296b.png">

Also link to Select page

<img width="1045" alt="Screen Shot 2020-04-12 at 4 14 30 PM" src="https://user-images.githubusercontent.com/13592258/79082246-217fea00-7cd9-11ea-8d96-1a69769d1e19.png">

### How was this patch tested?
Manually build and check

Closes #28196 from huaxingao/cte.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
2020-04-16 08:34:26 +09:00

163 lines
6.9 KiB
Markdown

---
layout: global
title: SELECT
displayTitle: SELECT
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
### Description
Spark supports a `SELECT` statement and conforms to the ANSI SQL standard. Queries are
used to retrieve result sets from one or more tables. The following section
describes the overall query syntax and the sub-sections cover different constructs
of a query along with examples.
### Syntax
{% highlight sql %}
[ WITH with_query [ , ... ] ]
select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_statement, ... ]
[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
[ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
[ CLUSTER BY { expression [ , ...] } ]
[ DISTRIBUTE BY { expression [, ...] } ]
[ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
[ LIMIT { ALL | expression } ]
{% endhighlight %}
While `select_statement` is defined as
{% highlight sql %}
SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
FROM { from_item [ , ...] }
[ WHERE boolean_expression ]
[ GROUP BY expression [ , ...] ]
[ HAVING boolean_expression ]
{% endhighlight %}
### Parameters
<dl>
<dt><code><em>with_query</em></code></dt>
<dd>
Specifies the <a href="sql-ref-syntax-qry-select-cte.html">common table expressions (CTEs)</a> before the main query block.
These table expressions are allowed to be referenced later in the FROM clause. This is useful to abstract
out repeated subquery blocks in the FROM clause and improves readability of the query.
</dd>
<dt><code><em>hints</em></code></dt>
<dd>
Hints can be specified to help spark optimizer make better planning decisions. Currently spark supports hints
that influence selection of join strategies and repartitioning of the data.
</dd>
<dt><code><em>ALL</em></code></dt>
<dd>
Select all matching rows from the relation and is enabled by default.
</dd>
<dt><code><em>DISTINCT</em></code></dt>
<dd>
Select all matching rows from the relation after removing duplicates in results.
</dd>
<dt><code><em>named_expression</em></code></dt>
<dd>
An expression with an assigned name. In general, it denotes a column expression.<br><br>
<b>Syntax:</b>
<code>
expression [AS] [alias]
</code>
</dd>
<dt><code><em>from_item</em></code></dt>
<dd>
Specifies a source of input for the query. It can be one of the following:
<ol>
<li>Table relation</li>
<li><a href="sql-ref-syntax-qry-select-join.html">Join relation</a></li>
<li><a href="sql-ref-syntax-qry-select-tvf.html">Table-value function</a></li>
<li><a href="sql-ref-syntax-qry-select-inline-table.html">Inline table</a></li>
<li>Subquery</li>
</ol>
</dd>
<dt><code><em>WHERE</em></code></dt>
<dd>
Filters the result of the FROM clause based on the supplied predicates.
</dd>
<dt><code><em>GROUP BY</em></code></dt>
<dd>
Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions
(MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group.
When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
</dd>
<dt><code><em>HAVING</em></code></dt>
<dd>
Specifies the predicates by which the rows produced by GROUP BY are filtered. The HAVING clause is used to
filter rows after the grouping is performed. If HAVING is specified without GROUP BY, it indicates a GROUP BY
without grouping expressions (global aggregate).
</dd>
<dt><code><em>ORDER BY</em></code></dt>
<dd>
Specifies an ordering of the rows of the complete result set of the query. The output rows are ordered
across the partitions. This parameter is mutually exclusive with <code>SORT BY</code>,
<code>CLUSTER BY</code> and <code>DISTRIBUTE BY</code> and can not be specified together.
</dd>
<dt><code><em>SORT BY</em></code></dt>
<dd>
Specifies an ordering by which the rows are ordered within each partition. This parameter is mutually
exclusive with <code>ORDER BY</code> and <code>CLUSTER BY</code> and can not be specified together.
</dd>
<dt><code><em>CLUSTER BY</em></code></dt>
<dd>
Specifies a set of expressions that is used to repartition and sort the rows. Using this clause has
the same effect of using <code>DISTRIBUTE BY</code> and <code>SORT BY</code> together.
</dd>
<dt><code><em>DISTRIBUTE BY</em></code></dt>
<dd>
Specifies a set of expressions by which the result rows are repartitioned. This parameter is mutually
exclusive with <code>ORDER BY</code> and <code>CLUSTER BY</code> and can not be specified together.
</dd>
<dt><code><em>LIMIT</em></code></dt>
<dd>
Specifies the maximum number of rows that can be returned by a statement or subquery. This clause
is mostly used in the conjunction with <code>ORDER BY</code> to produce a deterministic result.
</dd>
<dt><code><em>boolean_expression</em></code></dt>
<dd>
Specifies an expression with a return type of boolean.
</dd>
<dt><code><em>expression</em></code></dt>
<dd>
Specifies a combination of one or more values, operators, and SQL functions that evaluates to a value.
</dd>
<dt><code><em>named_window</em></code></dt>
<dd>
Specifies aliases for one or more source window specifications. The source window specifications can
be referenced in the widow definitions in the query.
</dd>
</dl>
### Related Statements
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
* [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
* [HAVING Clause](sql-ref-syntax-qry-select-having.html)
* [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
* [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
* [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [TABLESAMPLE](sql-ref-syntax-qry-sampling.html)
* [JOIN](sql-ref-syntax-qry-select-join.html)
* [SET Operators](sql-ref-syntax-qry-select-setops.html)
* [Common Table Expression](sql-ref-syntax-qry-select-cte.html)