a75dc80a76
### What changes were proposed in this pull request? Remove the unneeded embedded inline HTML markup by using the basic markdown syntax. Please see #28414 ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually build and check Closes #28451 from huaxingao/html_cleanup. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>
162 lines
6.3 KiB
Markdown
162 lines
6.3 KiB
Markdown
---
|
|
layout: global
|
|
title: SELECT
|
|
displayTitle: SELECT
|
|
license: |
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
---
|
|
|
|
### Description
|
|
|
|
Spark supports a `SELECT` statement and conforms to the ANSI SQL standard. Queries are
|
|
used to retrieve result sets from one or more tables. The following section
|
|
describes the overall query syntax and the sub-sections cover different constructs
|
|
of a query along with examples.
|
|
|
|
### Syntax
|
|
|
|
```sql
|
|
[ WITH with_query [ , ... ] ]
|
|
select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_statement, ... ]
|
|
[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
|
|
[ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
|
|
[ CLUSTER BY { expression [ , ...] } ]
|
|
[ DISTRIBUTE BY { expression [, ...] } ]
|
|
[ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
|
|
[ LIMIT { ALL | expression } ]
|
|
```
|
|
|
|
While `select_statement` is defined as
|
|
```sql
|
|
SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
|
|
FROM { from_item [ , ...] }
|
|
[ WHERE boolean_expression ]
|
|
[ GROUP BY expression [ , ...] ]
|
|
[ HAVING boolean_expression ]
|
|
```
|
|
|
|
### Parameters
|
|
|
|
* **with_query**
|
|
|
|
Specifies the [common table expressions (CTEs)](sql-ref-syntax-qry-select-cte.html) before the main query block.
|
|
These table expressions are allowed to be referenced later in the FROM clause. This is useful to abstract
|
|
out repeated subquery blocks in the FROM clause and improves readability of the query.
|
|
|
|
* **hints**
|
|
|
|
Hints can be specified to help spark optimizer make better planning decisions. Currently spark supports hints
|
|
that influence selection of join strategies and repartitioning of the data.
|
|
|
|
* **ALL**
|
|
|
|
Select all matching rows from the relation and is enabled by default.
|
|
|
|
* **DISTINCT**
|
|
|
|
Select all matching rows from the relation after removing duplicates in results.
|
|
|
|
* **named_expression**
|
|
|
|
An expression with an assigned name. In general, it denotes a column expression.
|
|
|
|
**Syntax:** `expression [AS] [alias]`
|
|
|
|
* **from_item**
|
|
|
|
Specifies a source of input for the query. It can be one of the following:
|
|
* Table relation
|
|
* [Join relation](sql-ref-syntax-qry-select-join.html)
|
|
* [Table-value function](sql-ref-syntax-qry-select-tvf.html)
|
|
* [Inline table](sql-ref-syntax-qry-select-inline-table.html)
|
|
* Subquery
|
|
|
|
|
|
* **WHERE**
|
|
|
|
Filters the result of the FROM clause based on the supplied predicates.
|
|
|
|
* **GROUP BY**
|
|
|
|
Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions
|
|
(MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group.
|
|
When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
|
|
|
|
* **HAVING**
|
|
|
|
Specifies the predicates by which the rows produced by GROUP BY are filtered. The HAVING clause is used to
|
|
filter rows after the grouping is performed. If HAVING is specified without GROUP BY, it indicates a GROUP BY
|
|
without grouping expressions (global aggregate).
|
|
|
|
* **ORDER BY**
|
|
|
|
Specifies an ordering of the rows of the complete result set of the query. The output rows are ordered
|
|
across the partitions. This parameter is mutually exclusive with `SORT BY`,
|
|
`CLUSTER BY` and `DISTRIBUTE BY` and can not be specified together.
|
|
|
|
* **SORT BY**
|
|
|
|
Specifies an ordering by which the rows are ordered within each partition. This parameter is mutually
|
|
exclusive with `ORDER BY` and `CLUSTER BY` and can not be specified together.
|
|
|
|
* **CLUSTER BY**
|
|
|
|
Specifies a set of expressions that is used to repartition and sort the rows. Using this clause has
|
|
the same effect of using `DISTRIBUTE BY` and `SORT BY` together.
|
|
|
|
* **DISTRIBUTE BY**
|
|
|
|
Specifies a set of expressions by which the result rows are repartitioned. This parameter is mutually
|
|
exclusive with `ORDER BY` and `CLUSTER BY` and can not be specified together.
|
|
|
|
* **LIMIT**
|
|
|
|
Specifies the maximum number of rows that can be returned by a statement or subquery. This clause
|
|
is mostly used in the conjunction with `ORDER BY` to produce a deterministic result.
|
|
|
|
* **boolean_expression**
|
|
|
|
Specifies an expression with a return type of boolean.
|
|
|
|
* **expression**
|
|
|
|
Specifies a combination of one or more values, operators, and SQL functions that evaluates to a value.
|
|
|
|
* **named_window**
|
|
|
|
Specifies aliases for one or more source window specifications. The source window specifications can
|
|
be referenced in the widow definitions in the query.
|
|
|
|
### Related Statements
|
|
|
|
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
|
|
* [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
|
|
* [HAVING Clause](sql-ref-syntax-qry-select-having.html)
|
|
* [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
|
|
* [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
|
|
* [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html)
|
|
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
|
|
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
|
|
* [Common Table Expression](sql-ref-syntax-qry-select-cte.html)
|
|
* [Inline Table](sql-ref-syntax-qry-select-inline-table.html)
|
|
* [JOIN](sql-ref-syntax-qry-select-join.html)
|
|
* [Join Hints](sql-ref-syntax-qry-select-hints.html)
|
|
* [LIKE Predicate](sql-ref-syntax-qry-select-like.html)
|
|
* [Set Operators](sql-ref-syntax-qry-select-setops.html)
|
|
* [TABLESAMPLE](sql-ref-syntax-qry-sampling.html)
|
|
* [Table-valued Function](sql-ref-syntax-qry-select-tvf.html)
|
|
* [Window Function](sql-ref-syntax-qry-window.html)
|