179289f0bf
### What changes were proposed in this pull request? This PR intends to clean up the SQL documents in `doc/sql-ref*`. Main changes are as follows; - Fixes wrong syntaxes and capitalize sub-titles - Adds some DDL queries in `Examples` so that users can run examples there - Makes query output in `Examples` follows the `Dataset.showString` (right-aligned) format - Adds/Removes spaces, Indents, or blank lines to follow the format below; ``` --- license... --- ### Description Writes what's the syntax is. ### Syntax {% highlight sql %} SELECT... WHERE... // 4 indents after the second line ... {% endhighlight %} ### Parameters <dl> <dt><code><em>Param Name</em></code></dt> <dd> Param Description </dd> ... </dl> ### Examples {% highlight sql %} -- It is better that users are able to execute example queries here. -- So, we prepare test data in the first section if possible. CREATE TABLE t (key STRING, value DOUBLE); INSERT INTO t VALUES ('a', 1.0), ('a', 2.0), ('b', 3.0), ('c', 4.0); -- query output has 2 indents and it follows the `Dataset.showString` -- format (right-aligned). SELECT * FROM t; +---+-----+ |key|value| +---+-----+ | a| 1.0| | a| 2.0| | b| 3.0| | c| 4.0| +---+-----+ -- Query statements after the second line have 4 indents. SELECT key, SUM(value) FROM t GROUP BY key; +---+----------+ |key|sum(value)| +---+----------+ | c| 4.0| | b| 3.0| | a| 3.0| +---+----------+ ... {% endhighlight %} ### Related Statements * [XXX](xxx.html) * ... ``` ### Why are the changes needed? The most changes of this PR are pretty minor, but I think the consistent formats/rules to write documents are important for long-term maintenance in our community ### Does this PR introduce any user-facing change? Yes. ### How was this patch tested? Manually checked. Closes #28151 from maropu/MakeRightAligned. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Sean Owen <srowen@gmail.com>
162 lines
6.8 KiB
Markdown
162 lines
6.8 KiB
Markdown
---
|
|
layout: global
|
|
title: SELECT
|
|
displayTitle: SELECT
|
|
license: |
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
---
|
|
|
|
### Description
|
|
|
|
Spark supports a `SELECT` statement and conforms to the ANSI SQL standard. Queries are
|
|
used to retrieve result sets from one or more tables. The following section
|
|
describes the overall query syntax and the sub-sections cover different constructs
|
|
of a query along with examples.
|
|
|
|
### Syntax
|
|
|
|
{% highlight sql %}
|
|
[ WITH with_query [ , ... ] ]
|
|
select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_statement, ... ]
|
|
[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
|
|
[ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
|
|
[ CLUSTER BY { expression [ , ...] } ]
|
|
[ DISTRIBUTE BY { expression [, ...] } ]
|
|
[ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
|
|
[ LIMIT { ALL | expression } ]
|
|
{% endhighlight %}
|
|
|
|
While `select_statement` is defined as
|
|
{% highlight sql %}
|
|
SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
|
|
FROM { from_item [ , ...] }
|
|
[ WHERE boolean_expression ]
|
|
[ GROUP BY expression [ , ...] ]
|
|
[ HAVING boolean_expression ]
|
|
{% endhighlight %}
|
|
|
|
### Parameters
|
|
|
|
<dl>
|
|
<dt><code><em>with_query</em></code></dt>
|
|
<dd>
|
|
Specifies the common table expressions (CTEs) before the main query block.
|
|
These table expressions are allowed to be referenced later in the FROM clause. This is useful to abstract
|
|
out repeated subquery blocks in the FROM clause and improves readability of the query.
|
|
</dd>
|
|
<dt><code><em>hints</em></code></dt>
|
|
<dd>
|
|
Hints can be specified to help spark optimizer make better planning decisions. Currently spark supports hints
|
|
that influence selection of join strategies and repartitioning of the data.
|
|
</dd>
|
|
<dt><code><em>ALL</em></code></dt>
|
|
<dd>
|
|
Select all matching rows from the relation and is enabled by default.
|
|
</dd>
|
|
<dt><code><em>DISTINCT</em></code></dt>
|
|
<dd>
|
|
Select all matching rows from the relation after removing duplicates in results.
|
|
</dd>
|
|
<dt><code><em>named_expression</em></code></dt>
|
|
<dd>
|
|
An expression with an assigned name. In general, it denotes a column expression.<br><br>
|
|
<b>Syntax:</b>
|
|
<code>
|
|
expression [AS] [alias]
|
|
</code>
|
|
</dd>
|
|
<dt><code><em>from_item</em></code></dt>
|
|
<dd>
|
|
Specifies a source of input for the query. It can be one of the following:
|
|
<ol>
|
|
<li>Table relation</li>
|
|
<li><a href="sql-ref-syntax-qry-select-join.html">Join relation</a></li>
|
|
<li><a href="sql-ref-syntax-qry-select-tvf.html">Table-value function</a></li>
|
|
<li><a href="sql-ref-syntax-qry-select-inline-table.html">Inline table</a></li>
|
|
<li>Subquery</li>
|
|
</ol>
|
|
</dd>
|
|
<dt><code><em>WHERE</em></code></dt>
|
|
<dd>
|
|
Filters the result of the FROM clause based on the supplied predicates.
|
|
</dd>
|
|
<dt><code><em>GROUP BY</em></code></dt>
|
|
<dd>
|
|
Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions
|
|
(MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group.
|
|
When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
|
|
</dd>
|
|
<dt><code><em>HAVING</em></code></dt>
|
|
<dd>
|
|
Specifies the predicates by which the rows produced by GROUP BY are filtered. The HAVING clause is used to
|
|
filter rows after the grouping is performed. If HAVING is specified without GROUP BY, it indicates a GROUP BY
|
|
without grouping expressions (global aggregate).
|
|
</dd>
|
|
<dt><code><em>ORDER BY</em></code></dt>
|
|
<dd>
|
|
Specifies an ordering of the rows of the complete result set of the query. The output rows are ordered
|
|
across the partitions. This parameter is mutually exclusive with <code>SORT BY</code>,
|
|
<code>CLUSTER BY</code> and <code>DISTRIBUTE BY</code> and can not be specified together.
|
|
</dd>
|
|
<dt><code><em>SORT BY</em></code></dt>
|
|
<dd>
|
|
Specifies an ordering by which the rows are ordered within each partition. This parameter is mutually
|
|
exclusive with <code>ORDER BY</code> and <code>CLUSTER BY</code> and can not be specified together.
|
|
</dd>
|
|
<dt><code><em>CLUSTER BY</em></code></dt>
|
|
<dd>
|
|
Specifies a set of expressions that is used to repartition and sort the rows. Using this clause has
|
|
the same effect of using <code>DISTRIBUTE BY</code> and <code>SORT BY</code> together.
|
|
</dd>
|
|
<dt><code><em>DISTRIBUTE BY</em></code></dt>
|
|
<dd>
|
|
Specifies a set of expressions by which the result rows are repartitioned. This parameter is mutually
|
|
exclusive with <code>ORDER BY</code> and <code>CLUSTER BY</code> and can not be specified together.
|
|
</dd>
|
|
<dt><code><em>LIMIT</em></code></dt>
|
|
<dd>
|
|
Specifies the maximum number of rows that can be returned by a statement or subquery. This clause
|
|
is mostly used in the conjunction with <code>ORDER BY</code> to produce a deterministic result.
|
|
</dd>
|
|
<dt><code><em>boolean_expression</em></code></dt>
|
|
<dd>
|
|
Specifies an expression with a return type of boolean.
|
|
</dd>
|
|
<dt><code><em>expression</em></code></dt>
|
|
<dd>
|
|
Specifies a combination of one or more values, operators, and SQL functions that evaluates to a value.
|
|
</dd>
|
|
<dt><code><em>named_window</em></code></dt>
|
|
<dd>
|
|
Specifies aliases for one or more source window specifications. The source window specifications can
|
|
be referenced in the widow definitions in the query.
|
|
</dd>
|
|
</dl>
|
|
|
|
### Related Statements
|
|
|
|
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
|
* [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
|
* [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
|
* [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
|
* [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
|
* [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
|
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
|
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
|
* [TABLESAMPLE](sql-ref-syntax-qry-sampling.html)
|
|
* [JOIN](sql-ref-syntax-qry-select-join.html)
|
|
* [SET Operators](sql-ref-syntax-qry-select-setops.html)
|