[SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references
### What changes were proposed in this pull request? This PR intends to improve the SQL document of `GROUP BY`; it added the description about FILTER clauses of aggregate functions. ### Why are the changes needed? To improve the SQL documents ### Does this PR introduce any user-facing change? Yes. <img src="https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png" width="500"> ### How was this patch tested? Manually checked. Closes #28134 from maropu/SPARK-31358. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
This commit is contained in:
parent
112caea214
commit
e24f0dcd27
|
@ -21,6 +21,7 @@ license: |
|
|||
The <code>GROUP BY</code> clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on
|
||||
the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple
|
||||
aggregations for the same input record set via `GROUPING SETS`, `CUBE`, `ROLLUP` clauses.
|
||||
When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
|
||||
|
||||
### Syntax
|
||||
{% highlight sql %}
|
||||
|
@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ]
|
|||
GROUP BY GROUPING SETS (grouping_set [ , ...])
|
||||
{% endhighlight %}
|
||||
|
||||
While aggregate functions are defined as
|
||||
{% highlight sql %}
|
||||
aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_expression ) ]
|
||||
{% endhighlight %}
|
||||
|
||||
### Parameters
|
||||
<dl>
|
||||
<dt><code><em>GROUPING SETS</em></code></dt>
|
||||
|
@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...])
|
|||
((warehouse, product), (warehouse), (product), ())</code>.
|
||||
The N elements of a <code>CUBE</code> specification results in 2^N <code>GROUPING SETS</code>.
|
||||
</dd>
|
||||
<dt><code><em>aggregate_name</em></code></dt>
|
||||
<dd>
|
||||
Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.).
|
||||
</dd>
|
||||
<dt><code><em>DISTINCT</em></code></dt>
|
||||
<dd>
|
||||
Removes duplicates in input rows before they are passed to aggregate functions.
|
||||
</dd>
|
||||
<dt><code><em>FILTER</em></code></dt>
|
||||
<dd>
|
||||
Filters the input rows for which the <code>boolean_expression</code> in the <code>WHERE</code> clause evaluates
|
||||
to true are passed to the aggregate function; other rows are discarded.
|
||||
</dd>
|
||||
</dl>
|
||||
|
||||
### Examples
|
||||
|
@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM dealer GROUP BY id OR
|
|||
|300|13 |8 |
|
||||
+---+---+---+
|
||||
|
||||
-- Count the number of distinct dealer cities per car_model.
|
||||
SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
|
||||
|
||||
+------------+-----+
|
||||
| car_model|count|
|
||||
+------------+-----+
|
||||
| Honda Civic| 3|
|
||||
| Honda CRV| 2|
|
||||
|Honda Accord| 3|
|
||||
+------------+-----+
|
||||
|
||||
-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
|
||||
SELECT id, sum(quantity) FILTER (
|
||||
WHERE car_model IN ('Honda Civic', 'Honda CRV')
|
||||
) AS `sum(quantity)` FROM dealer
|
||||
GROUP BY id ORDER BY id;
|
||||
|
||||
+---+-------------+
|
||||
| id|sum(quantity)|
|
||||
+---+-------------+
|
||||
|100| 17|
|
||||
|200| 23|
|
||||
|300| 5|
|
||||
+---+-------------+
|
||||
|
||||
-- Aggregations using multiple sets of grouping columns in a single statement.
|
||||
-- Following performs aggregations based on four sets of grouping columns.
|
||||
-- 1. city, car_model
|
||||
|
|
|
@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
|
|||
<dd>
|
||||
Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions
|
||||
(MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group.
|
||||
When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
|
||||
</dd>
|
||||
<dt><code><em>HAVING</em></code></dt>
|
||||
<dd>
|
||||
|
|
Loading…
Reference in a new issue