d59e7195f6
### What changes were proposed in this pull request? The filter predicate for aggregate expression is an `ANSI SQL`. ``` <aggregate function> ::= COUNT <left paren> <asterisk> <right paren> [ <filter clause> ] | <general set function> [ <filter clause> ] | <binary set function> [ <filter clause> ] | <ordered set function> [ <filter clause> ] | <array aggregate function> [ <filter clause> ] | <row pattern count function> [ <filter clause> ] ``` There are some mainstream database support this syntax. **PostgreSQL:** https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES For example: ``` SELECT year, count(*) FILTER (WHERE gdp_per_capita >= 40000) FROM countries GROUP BY year ``` ``` SELECT year, code, gdp_per_capita, count(*) FILTER (WHERE gdp_per_capita >= 40000) OVER (PARTITION BY year) FROM countries ``` **jOOQ:** https://blog.jooq.org/2014/12/30/the-awesome-postgresql-9-4-sql2003-filter-clause-for-aggregate-functions/ **Notice:** 1.This PR only supports FILTER predicate without codegen. maropu will create another PR is related to SPARK-30027 to support codegen. 2.This PR only supports FILTER predicate without DISTINCT. I will create another PR is related to SPARK-30276 to support this. 3.This PR only supports FILTER predicate that can't reference the outer query. I created ticket SPARK-30219 to support it. 4.This PR only supports FILTER predicate that can't use IN/EXISTS predicate sub-queries. I created ticket SPARK-30220 to support it. 5.Spark SQL cannot supports a SQL with nested aggregate. I created ticket SPARK-30182 to support it. There are some show of the PR on my production environment. ``` spark-sql> desc gja_test_partition; key string NULL value string NULL other string NULL col2 int NULL # Partition Information # col_name data_type comment col2 int NULL Time taken: 0.79 s ``` ``` spark-sql> select * from gja_test_partition; a A ao 1 b B bo 1 c C co 1 d D do 1 e E eo 2 g G go 2 h H ho 2 j J jo 2 f F fo 3 k K ko 3 l L lo 4 i I io 4 Time taken: 1.75 s ``` ``` spark-sql> select count(key), sum(col2) from gja_test_partition; 12 26 Time taken: 1.848 s ``` ``` spark-sql> select count(key) filter (where col2 > 1) from gja_test_partition; 8 Time taken: 2.926 s ``` ``` spark-sql> select sum(col2) filter (where col2 > 2) from gja_test_partition; 14 Time taken: 2.087 s ``` ``` spark-sql> select count(key) filter (where col2 > 1), sum(col2) filter (where col2 > 2) from gja_test_partition; 8 14 Time taken: 2.847 s ``` ``` spark-sql> select count(key), count(key) filter (where col2 > 1), sum(col2), sum(col2) filter (where col2 > 2) from gja_test_partition; 12 8 26 14 Time taken: 1.787 s ``` ``` spark-sql> desc student; id int NULL name string NULL sex string NULL class_id int NULL Time taken: 0.206 s ``` ``` spark-sql> select * from student; 1 张三 man 1 2 李四 man 1 3 王五 man 2 4 赵六 man 2 5 钱小花 woman 1 6 赵九红 woman 2 7 郭丽丽 woman 2 Time taken: 0.786 s ``` ``` spark-sql> select class_id, count(id), sum(id) from student group by class_id; 1 3 8 2 4 20 Time taken: 18.783 s ``` ``` spark-sql> select class_id, count(id) filter (where sex = 'man'), sum(id) filter (where sex = 'woman') from student group by class_id; 1 2 5 2 2 13 Time taken: 3.887 s ``` ### Why are the changes needed? Add new SQL feature. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Exists UT and new UT. Closes #26656 from beliefer/support-aggregate-clause. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
25 KiB
25 KiB
layout | title | displayTitle | license |
---|---|---|---|
global | Spark SQL Keywords | Spark SQL Keywords | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. |
When spark.sql.ansi.enabled
is true, Spark SQL will use the ANSI mode parser.
In this mode, Spark SQL has two kinds of keywords:
- Reserved keywords: Keywords that are reserved and can't be used as identifiers for table, view, column, function, alias, etc.
- Non-reserved keywords: Keywords that have a special meaning only in particular contexts and can be used as identifiers in other contexts. For example,
SELECT 1 WEEK
is an interval literal, but WEEK can be used as identifiers in other places.
When the ANSI mode is disabled, Spark SQL has two kinds of keywords:
- Non-reserved keywords: Same definition as the one when the ANSI mode enabled.
- Strict-non-reserved keywords: A strict version of non-reserved keywords, which can not be used as table alias.
By default spark.sql.ansi.enabled
is false.
Below is a list of all the keywords in Spark SQL.
Keyword | Spark SQL | SQL-2011 | |
---|---|---|---|
ANSI mode | default mode | ||
ADD | non-reserved | non-reserved | non-reserved |
AFTER | non-reserved | non-reserved | non-reserved |
ALL | reserved | non-reserved | reserved |
ALTER | non-reserved | non-reserved | reserved |
ANALYZE | non-reserved | non-reserved | non-reserved |
AND | reserved | non-reserved | reserved |
ANTI | reserved | strict-non-reserved | non-reserved |
ANY | reserved | non-reserved | reserved |
ARCHIVE | non-reserved | non-reserved | non-reserved |
ARRAY | non-reserved | non-reserved | reserved |
AS | reserved | non-reserved | reserved |
ASC | non-reserved | non-reserved | non-reserved |
AT | non-reserved | non-reserved | reserved |
AUTHORIZATION | reserved | non-reserved | reserved |
BETWEEN | non-reserved | non-reserved | reserved |
BOTH | reserved | non-reserved | reserved |
BUCKET | non-reserved | non-reserved | non-reserved |
BUCKETS | non-reserved | non-reserved | non-reserved |
BY | non-reserved | non-reserved | reserved |
CACHE | non-reserved | non-reserved | non-reserved |
CASCADE | non-reserved | non-reserved | reserved |
CASE | reserved | non-reserved | reserved |
CAST | reserved | non-reserved | reserved |
CHANGE | non-reserved | non-reserved | non-reserved |
CHECK | reserved | non-reserved | reserved |
CLEAR | non-reserved | non-reserved | non-reserved |
CLUSTER | non-reserved | non-reserved | non-reserved |
CLUSTERED | non-reserved | non-reserved | non-reserved |
CODEGEN | non-reserved | non-reserved | non-reserved |
COLLATE | reserved | non-reserved | reserved |
COLLECTION | non-reserved | non-reserved | non-reserved |
COLUMN | reserved | non-reserved | reserved |
COLUMNS | non-reserved | non-reserved | non-reserved |
COMMENT | non-reserved | non-reserved | non-reserved |
COMMIT | non-reserved | non-reserved | reserved |
COMPACT | non-reserved | non-reserved | non-reserved |
COMPACTIONS | non-reserved | non-reserved | non-reserved |
COMPUTE | non-reserved | non-reserved | non-reserved |
CONCATENATE | non-reserved | non-reserved | non-reserved |
CONSTRAINT | reserved | non-reserved | reserved |
COST | non-reserved | non-reserved | non-reserved |
CREATE | reserved | non-reserved | reserved |
CROSS | reserved | strict-non-reserved | reserved |
CUBE | non-reserved | non-reserved | reserved |
CURRENT | non-reserved | non-reserved | reserved |
CURRENT_DATE | reserved | non-reserved | reserved |
CURRENT_TIME | reserved | non-reserved | reserved |
CURRENT_TIMESTAMP | reserved | non-reserved | reserved |
CURRENT_USER | reserved | non-reserved | reserved |
DATA | non-reserved | non-reserved | non-reserved |
DATABASE | non-reserved | non-reserved | non-reserved |
DATABASES | non-reserved | non-reserved | non-reserved |
DAY | reserved | non-reserved | reserved |
DAYS | non-reserved | non-reserved | non-reserved |
DBPROPERTIES | non-reserved | non-reserved | non-reserved |
DEFINED | non-reserved | non-reserved | non-reserved |
DELETE | non-reserved | non-reserved | reserved |
DELIMITED | non-reserved | non-reserved | non-reserved |
DESC | non-reserved | non-reserved | non-reserved |
DESCRIBE | non-reserved | non-reserved | reserved |
DFS | non-reserved | non-reserved | non-reserved |
DIRECTORIES | non-reserved | non-reserved | non-reserved |
DIRECTORY | non-reserved | non-reserved | non-reserved |
DISTINCT | reserved | non-reserved | reserved |
DISTRIBUTE | non-reserved | non-reserved | non-reserved |
DIV | non-reserved | non-reserved | non-reserved |
DROP | non-reserved | non-reserved | reserved |
ELSE | reserved | non-reserved | reserved |
END | reserved | non-reserved | reserved |
ESCAPE | reserved | non-reserved | reserved |
ESCAPED | non-reserved | non-reserved | non-reserved |
EXCEPT | reserved | strict-non-reserved | reserved |
EXCHANGE | non-reserved | non-reserved | non-reserved |
EXISTS | non-reserved | non-reserved | reserved |
EXPLAIN | non-reserved | non-reserved | non-reserved |
EXPORT | non-reserved | non-reserved | non-reserved |
EXTENDED | non-reserved | non-reserved | non-reserved |
EXTERNAL | non-reserved | non-reserved | reserved |
EXTRACT | non-reserved | non-reserved | reserved |
FALSE | reserved | non-reserved | reserved |
FETCH | reserved | non-reserved | reserved |
FIELDS | non-reserved | non-reserved | non-reserved |
FILTER | reserved | non-reserved | reserved |
FILEFORMAT | non-reserved | non-reserved | non-reserved |
FIRST | non-reserved | non-reserved | non-reserved |
FIRST_VALUE | reserved | non-reserved | reserved |
FOLLOWING | non-reserved | non-reserved | non-reserved |
FOR | reserved | non-reserved | reserved |
FOREIGN | reserved | non-reserved | reserved |
FORMAT | non-reserved | non-reserved | non-reserved |
FORMATTED | non-reserved | non-reserved | non-reserved |
FROM | reserved | non-reserved | reserved |
FULL | reserved | strict-non-reserved | reserved |
FUNCTION | non-reserved | non-reserved | reserved |
FUNCTIONS | non-reserved | non-reserved | non-reserved |
GLOBAL | non-reserved | non-reserved | reserved |
GRANT | reserved | non-reserved | reserved |
GROUP | reserved | non-reserved | reserved |
GROUPING | non-reserved | non-reserved | reserved |
HAVING | reserved | non-reserved | reserved |
HOUR | reserved | non-reserved | reserved |
HOURS | non-reserved | non-reserved | non-reserved |
IF | non-reserved | non-reserved | reserved |
IGNORE | non-reserved | non-reserved | non-reserved |
IMPORT | non-reserved | non-reserved | non-reserved |
IN | reserved | non-reserved | reserved |
INDEX | non-reserved | non-reserved | non-reserved |
INDEXES | non-reserved | non-reserved | non-reserved |
INNER | reserved | strict-non-reserved | reserved |
INPATH | non-reserved | non-reserved | non-reserved |
INPUTFORMAT | non-reserved | non-reserved | non-reserved |
INSERT | non-reserved | non-reserved | reserved |
INTERSECT | reserved | strict-non-reserved | reserved |
INTERVAL | non-reserved | non-reserved | reserved |
INTO | reserved | non-reserved | reserved |
IS | reserved | non-reserved | reserved |
ITEMS | non-reserved | non-reserved | non-reserved |
JOIN | reserved | strict-non-reserved | reserved |
KEYS | non-reserved | non-reserved | non-reserved |
LAST | non-reserved | non-reserved | non-reserved |
LAST_VALUE | reserved | non-reserved | reserved |
LATERAL | non-reserved | non-reserved | reserved |
LAZY | non-reserved | non-reserved | non-reserved |
LEADING | reserved | non-reserved | reserved |
LEFT | reserved | strict-non-reserved | reserved |
LIKE | non-reserved | non-reserved | reserved |
LIMIT | non-reserved | non-reserved | non-reserved |
LINES | non-reserved | non-reserved | non-reserved |
LIST | non-reserved | non-reserved | non-reserved |
LOAD | non-reserved | non-reserved | non-reserved |
LOCAL | non-reserved | non-reserved | reserved |
LOCATION | non-reserved | non-reserved | non-reserved |
LOCK | non-reserved | non-reserved | non-reserved |
LOCKS | non-reserved | non-reserved | non-reserved |
LOGICAL | non-reserved | non-reserved | non-reserved |
MACRO | non-reserved | non-reserved | non-reserved |
MAP | non-reserved | non-reserved | non-reserved |
MATCHED | non-reserved | non-reserved | non-reserved |
MERGE | non-reserved | non-reserved | non-reserved |
MICROSECOND | non-reserved | non-reserved | non-reserved |
MICROSECONDS | non-reserved | non-reserved | non-reserved |
MILLISECOND | non-reserved | non-reserved | non-reserved |
MILLISECONDS | non-reserved | non-reserved | non-reserved |
MINUS | reserved | strict-non-reserved | non-reserved |
MINUTE | reserved | non-reserved | reserved |
MINUTES | non-reserved | non-reserved | non-reserved |
MONTH | reserved | non-reserved | reserved |
MONTHS | non-reserved | non-reserved | non-reserved |
MSCK | non-reserved | non-reserved | non-reserved |
NAMESPACE | non-reserved | non-reserved | non-reserved |
NAMESPACES | non-reserved | non-reserved | non-reserved |
NATURAL | reserved | strict-non-reserved | reserved |
NO | non-reserved | non-reserved | reserved |
NOT | reserved | non-reserved | reserved |
NULL | reserved | non-reserved | reserved |
NULLS | non-reserved | non-reserved | non-reserved |
OF | non-reserved | non-reserved | reserved |
ON | reserved | strict-non-reserved | reserved |
ONLY | reserved | non-reserved | reserved |
OPTION | non-reserved | non-reserved | non-reserved |
OPTIONS | non-reserved | non-reserved | non-reserved |
OR | reserved | non-reserved | reserved |
ORDER | reserved | non-reserved | reserved |
OUT | non-reserved | non-reserved | reserved |
OUTER | reserved | non-reserved | reserved |
OUTPUTFORMAT | non-reserved | non-reserved | non-reserved |
OVER | non-reserved | non-reserved | non-reserved |
OVERLAPS | reserved | non-reserved | reserved |
OVERLAY | non-reserved | non-reserved | non-reserved |
OVERWRITE | non-reserved | non-reserved | non-reserved |
PARTITION | non-reserved | non-reserved | reserved |
PARTITIONED | non-reserved | non-reserved | non-reserved |
PARTITIONS | non-reserved | non-reserved | non-reserved |
PERCENT | non-reserved | non-reserved | non-reserved |
PIVOT | non-reserved | non-reserved | non-reserved |
PLACING | non-reserved | non-reserved | non-reserved |
POSITION | non-reserved | non-reserved | reserved |
PRECEDING | non-reserved | non-reserved | non-reserved |
PRIMARY | reserved | non-reserved | reserved |
PRINCIPALS | non-reserved | non-reserved | non-reserved |
PROPERTIES | non-reserved | non-reserved | non-reserved |
PURGE | non-reserved | non-reserved | non-reserved |
QUERY | non-reserved | non-reserved | non-reserved |
RANGE | non-reserved | non-reserved | reserved |
RECORDREADER | non-reserved | non-reserved | non-reserved |
RECORDWRITER | non-reserved | non-reserved | non-reserved |
RECOVER | non-reserved | non-reserved | non-reserved |
REDUCE | non-reserved | non-reserved | non-reserved |
REFERENCES | reserved | non-reserved | reserved |
REFRESH | non-reserved | non-reserved | non-reserved |
RENAME | non-reserved | non-reserved | non-reserved |
REPAIR | non-reserved | non-reserved | non-reserved |
REPLACE | non-reserved | non-reserved | non-reserved |
RESET | non-reserved | non-reserved | non-reserved |
RESPECT | non-reserved | non-reserved | non-reserved |
RESTRICT | non-reserved | non-reserved | non-reserved |
REVOKE | non-reserved | non-reserved | reserved |
RIGHT | reserved | strict-non-reserved | reserved |
RLIKE | non-reserved | non-reserved | non-reserved |
ROLE | non-reserved | non-reserved | non-reserved |
ROLES | non-reserved | non-reserved | non-reserved |
ROLLBACK | non-reserved | non-reserved | reserved |
ROLLUP | non-reserved | non-reserved | reserved |
ROW | non-reserved | non-reserved | reserved |
ROWS | non-reserved | non-reserved | reserved |
SCHEMA | non-reserved | non-reserved | non-reserved |
SECOND | reserved | non-reserved | reserved |
SECONDS | non-reserved | non-reserved | non-reserved |
SELECT | reserved | non-reserved | reserved |
SEMI | reserved | strict-non-reserved | non-reserved |
SEPARATED | non-reserved | non-reserved | non-reserved |
SERDE | non-reserved | non-reserved | non-reserved |
SERDEPROPERTIES | non-reserved | non-reserved | non-reserved |
SESSION_USER | reserved | non-reserved | reserved |
SET | non-reserved | non-reserved | reserved |
SETS | non-reserved | non-reserved | non-reserved |
SHOW | non-reserved | non-reserved | non-reserved |
SKEWED | non-reserved | non-reserved | non-reserved |
SOME | reserved | non-reserved | reserved |
SORT | non-reserved | non-reserved | non-reserved |
SORTED | non-reserved | non-reserved | non-reserved |
START | non-reserved | non-reserved | reserved |
STATISTICS | non-reserved | non-reserved | non-reserved |
STORED | non-reserved | non-reserved | non-reserved |
STRATIFY | non-reserved | non-reserved | non-reserved |
STRUCT | non-reserved | non-reserved | non-reserved |
SUBSTR | non-reserved | non-reserved | non-reserved |
SUBSTRING | non-reserved | non-reserved | non-reserved |
TABLE | reserved | non-reserved | reserved |
TABLES | non-reserved | non-reserved | non-reserved |
TABLESAMPLE | non-reserved | non-reserved | reserved |
TBLPROPERTIES | non-reserved | non-reserved | non-reserved |
TEMPORARY | non-reserved | non-reserved | non-reserved |
TERMINATED | non-reserved | non-reserved | non-reserved |
THEN | reserved | non-reserved | reserved |
TO | reserved | non-reserved | reserved |
TOUCH | non-reserved | non-reserved | non-reserved |
TRAILING | reserved | non-reserved | reserved |
TRANSACTION | non-reserved | non-reserved | non-reserved |
TRANSACTIONS | non-reserved | non-reserved | non-reserved |
TRANSFORM | non-reserved | non-reserved | non-reserved |
TRIM | non-reserved | non-reserved | non-reserved |
TRUE | non-reserved | non-reserved | reserved |
TRUNCATE | non-reserved | non-reserved | reserved |
UNARCHIVE | non-reserved | non-reserved | non-reserved |
UNBOUNDED | non-reserved | non-reserved | non-reserved |
UNCACHE | non-reserved | non-reserved | non-reserved |
UNION | reserved | strict-non-reserved | reserved |
UNIQUE | reserved | non-reserved | reserved |
UNKNOWN | reserved | non-reserved | reserved |
UNLOCK | non-reserved | non-reserved | non-reserved |
UNSET | non-reserved | non-reserved | non-reserved |
UPDATE | non-reserved | non-reserved | reserved |
USE | non-reserved | non-reserved | non-reserved |
USER | reserved | non-reserved | reserved |
USING | reserved | strict-non-reserved | reserved |
VALUES | non-reserved | non-reserved | reserved |
VIEW | non-reserved | non-reserved | non-reserved |
WEEK | non-reserved | non-reserved | non-reserved |
WEEKS | non-reserved | non-reserved | non-reserved |
WHEN | reserved | non-reserved | reserved |
WHERE | reserved | non-reserved | reserved |
WINDOW | non-reserved | non-reserved | reserved |
WITH | reserved | non-reserved | reserved |
YEAR | reserved | non-reserved | reserved |
YEARS | non-reserved | non-reserved | non-reserved |