1b780f364b
### What changes were proposed in this pull request? Add Coalesce/Repartition/Repartition_By_Range Hints to SQL Reference ### Why are the changes needed? To make SQL reference complete ### Does this PR introduce _any_ user-facing change? <img width="1100" alt="Screen Shot 2020-05-29 at 6 46 38 PM" src="https://user-images.githubusercontent.com/13592258/83316782-d6fcf300-a1dc-11ea-87f6-e357b9c739fd.png"> <img width="1099" alt="Screen Shot 2020-05-29 at 6 43 30 PM" src="https://user-images.githubusercontent.com/13592258/83316784-d8c6b680-a1dc-11ea-95ea-10a1f75dcef9.png"> Only the the above pages are changed. The following two pages are the same as before. <img width="1100" alt="Screen Shot 2020-05-28 at 10 05 27 PM" src="https://user-images.githubusercontent.com/13592258/83223474-bfb3fc00-a12f-11ea-807a-824a618afa0b.png"> <img width="1099" alt="Screen Shot 2020-05-28 at 10 05 08 PM" src="https://user-images.githubusercontent.com/13592258/83223478-c2165600-a12f-11ea-806e-a1e57dc35ef4.png"> ### How was this patch tested? Manually build and check Closes #28672 from huaxingao/coalesce_hint. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>
192 lines
6.8 KiB
Markdown
192 lines
6.8 KiB
Markdown
---
|
|
layout: global
|
|
title: Window Functions
|
|
displayTitle: Window Functions
|
|
license: |
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
---
|
|
|
|
### Description
|
|
|
|
Window functions operate on a group of rows, referred to as a window, and calculate a return value for each row based on the group of rows. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative statistic, or accessing the value of rows given the relative position of the current row.
|
|
|
|
### Syntax
|
|
|
|
```sql
|
|
window_function OVER
|
|
( [ { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ , ... ] ) ]
|
|
{ ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ]
|
|
[ window_frame ] )
|
|
```
|
|
|
|
### Parameters
|
|
|
|
* **window_function**
|
|
|
|
* Ranking Functions
|
|
|
|
**Syntax:** `RANK | DENSE_RANK | PERCENT_RANK | NTILE | ROW_NUMBER`
|
|
|
|
* Analytic Functions
|
|
|
|
**Syntax:** `CUME_DIST | LAG | LEAD`
|
|
|
|
* Aggregate Functions
|
|
|
|
**Syntax:** `MAX | MIN | COUNT | SUM | AVG | ...`
|
|
|
|
Please refer to the [Built-in Aggregation Functions](sql-ref-functions-builtin.html#aggregate-functions) document for a complete list of Spark aggregate functions.
|
|
|
|
* **window_frame**
|
|
|
|
Specifies which row to start the window on and where to end it.
|
|
|
|
**Syntax:**
|
|
|
|
`{ RANGE | ROWS } { frame_start | BETWEEN frame_start AND frame_end }`
|
|
|
|
* `frame_start` and `frame_end` have the following syntax:
|
|
|
|
**Syntax:**
|
|
|
|
`UNBOUNDED PRECEDING | offset PRECEDING | CURRENT ROW | offset FOLLOWING | UNBOUNDED FOLLOWING`
|
|
|
|
`offset:` specifies the `offset` from the position of the current row.
|
|
|
|
**Note:** If `frame_end` is omitted it defaults to `CURRENT ROW`.
|
|
|
|
### Examples
|
|
|
|
```sql
|
|
CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT);
|
|
|
|
INSERT INTO employees VALUES ("Lisa", "Sales", 10000, 35);
|
|
INSERT INTO employees VALUES ("Evan", "Sales", 32000, 38);
|
|
INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28);
|
|
INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33);
|
|
INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33);
|
|
INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28);
|
|
INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38);
|
|
INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23);
|
|
INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25);
|
|
|
|
SELECT * FROM employees;
|
|
+-----+-----------+------+-----+
|
|
| name| dept|salary| age|
|
|
+-----+-----------+------+-----+
|
|
|Chloe|Engineering| 23000| 25|
|
|
| Fred|Engineering| 21000| 28|
|
|
| Paul|Engineering| 29000| 23|
|
|
|Helen| Marketing| 29000| 40|
|
|
| Tom|Engineering| 23000| 33|
|
|
| Jane| Marketing| 29000| 28|
|
|
| Jeff| Marketing| 35000| 38|
|
|
| Evan| Sales| 32000| 38|
|
|
| Lisa| Sales| 10000| 35|
|
|
| Alex| Sales| 30000| 33|
|
|
+-----+-----------+------+-----+
|
|
|
|
SELECT name, dept, RANK() OVER (PARTITION BY dept ORDER BY salary) AS rank FROM employees;
|
|
+-----+-----------+------+----+
|
|
| name| dept|salary|rank|
|
|
+-----+-----------+------+----+
|
|
| Lisa| Sales| 10000| 1|
|
|
| Alex| Sales| 30000| 2|
|
|
| Evan| Sales| 32000| 3|
|
|
| Fred|Engineering| 21000| 1|
|
|
| Tom|Engineering| 23000| 2|
|
|
|Chloe|Engineering| 23000| 2|
|
|
| Paul|Engineering| 29000| 4|
|
|
|Helen| Marketing| 29000| 1|
|
|
| Jane| Marketing| 29000| 1|
|
|
| Jeff| Marketing| 35000| 3|
|
|
+-----+-----------+------+----+
|
|
|
|
SELECT name, dept, DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary ROWS BETWEEN
|
|
UNBOUNDED PRECEDING AND CURRENT ROW) AS dense_rank FROM employees;
|
|
+-----+-----------+------+----------+
|
|
| name| dept|salary|dense_rank|
|
|
+-----+-----------+------+----------+
|
|
| Lisa| Sales| 10000| 1|
|
|
| Alex| Sales| 30000| 2|
|
|
| Evan| Sales| 32000| 3|
|
|
| Fred|Engineering| 21000| 1|
|
|
| Tom|Engineering| 23000| 2|
|
|
|Chloe|Engineering| 23000| 2|
|
|
| Paul|Engineering| 29000| 3|
|
|
|Helen| Marketing| 29000| 1|
|
|
| Jane| Marketing| 29000| 1|
|
|
| Jeff| Marketing| 35000| 2|
|
|
+-----+-----------+------+----------+
|
|
|
|
SELECT name, dept, age, CUME_DIST() OVER (PARTITION BY dept ORDER BY age
|
|
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cume_dist FROM employees;
|
|
+-----+-----------+------+------------------+
|
|
| name| dept|age | cume_dist|
|
|
+-----+-----------+------+------------------+
|
|
| Alex| Sales| 33|0.3333333333333333|
|
|
| Lisa| Sales| 35|0.6666666666666666|
|
|
| Evan| Sales| 38| 1.0|
|
|
| Paul|Engineering| 23| 0.25|
|
|
|Chloe|Engineering| 25| 0.75|
|
|
| Fred|Engineering| 28| 0.25|
|
|
| Tom|Engineering| 33| 1.0|
|
|
| Jane| Marketing| 28|0.3333333333333333|
|
|
| Jeff| Marketing| 38|0.6666666666666666|
|
|
|Helen| Marketing| 40| 1.0|
|
|
+-----+-----------+------+------------------+
|
|
|
|
SELECT name, dept, salary, MIN(salary) OVER (PARTITION BY dept ORDER BY salary) AS min
|
|
FROM employees;
|
|
+-----+-----------+------+-----+
|
|
| name| dept|salary| min|
|
|
+-----+-----------+------+-----+
|
|
| Lisa| Sales| 10000|10000|
|
|
| Alex| Sales| 30000|10000|
|
|
| Evan| Sales| 32000|10000|
|
|
|Helen| Marketing| 29000|29000|
|
|
| Jane| Marketing| 29000|29000|
|
|
| Jeff| Marketing| 35000|29000|
|
|
| Fred|Engineering| 21000|21000|
|
|
| Tom|Engineering| 23000|21000|
|
|
|Chloe|Engineering| 23000|21000|
|
|
| Paul|Engineering| 29000|21000|
|
|
+-----+-----------+------+-----+
|
|
|
|
SELECT name, salary,
|
|
LAG(salary) OVER (PARTITION BY dept ORDER BY salary) AS lag,
|
|
LEAD(salary, 1, 0) OVER (PARTITION BY dept ORDER BY salary) AS lead
|
|
FROM employees;
|
|
+-----+-----------+------+-----+-----+
|
|
| name| dept|salary| lag| lead|
|
|
+-----+-----------+------+-----+-----+
|
|
| Lisa| Sales| 10000|NULL |30000|
|
|
| Alex| Sales| 30000|10000|32000|
|
|
| Evan| Sales| 32000|30000| 0|
|
|
| Fred|Engineering| 21000| NULL|23000|
|
|
|Chloe|Engineering| 23000|21000|23000|
|
|
| Tom|Engineering| 23000|23000|29000|
|
|
| Paul|Engineering| 29000|23000| 0|
|
|
|Helen| Marketing| 29000| NULL|29000|
|
|
| Jane| Marketing| 29000|29000|35000|
|
|
| Jeff| Marketing| 35000|29000| 0|
|
|
+-----+-----------+------+-----+-----+
|
|
```
|
|
|
|
### Related Statements
|
|
|
|
* [SELECT](sql-ref-syntax-qry-select.html)
|