[SPARK-35028][SQL] ANSI mode: disallow group by aliases

### What changes were proposed in this pull request?

Disallow group by aliases under ANSI mode.

### Why are the changes needed?

As per the ANSI SQL standard secion 7.12 <group by clause>:

>Each `grouping column reference` shall unambiguously reference a column of the table resulting from the `from clause`. A column referenced in a `group by clause` is a grouping column.

By forbidding it, we can avoid ambiguous SQL queries like:
```
SELECT col + 1 as col FROM t GROUP BY col
```

### Does this PR introduce _any_ user-facing change?

Yes, group by aliases is not allowed under ANSI mode.

### How was this patch tested?

Unit tests

Closes #32129 from gengliangwang/disallowGroupByAlias.

Authored-by: Gengliang Wang <ltnwgl@gmail.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
This commit is contained in:
Gengliang Wang 2021-04-13 10:42:57 +08:00
parent 278203d969
commit 79e55b44f7
5 changed files with 1089 additions and 14 deletions

View file

@ -183,6 +183,7 @@ The behavior of some SQL functions can be different under ANSI mode (`spark.sql.
The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`). The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`).
- `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices. - `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices.
- `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map. - `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map.
- `GROUP BY`: aliases in a select list can not be used in GROUP BY clauses. Each column referenced in a GROUP BY clause shall unambiguously reference a column of the table resulting from the FROM clause.
### SQL Keywords ### SQL Keywords

View file

@ -1847,9 +1847,12 @@ class Analyzer(override val catalogManager: CatalogManager)
}} }}
} }
// Group by alias is not allowed in ANSI mode.
private def allowGroupByAlias: Boolean = conf.groupByAliases && !conf.ansiEnabled
override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
case agg @ Aggregate(groups, aggs, child) case agg @ Aggregate(groups, aggs, child)
if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && if allowGroupByAlias && child.resolved && aggs.forall(_.resolved) &&
groups.exists(!_.resolved) => groups.exists(!_.resolved) =>
agg.copy(groupingExpressions = mayResolveAttrByAggregateExprs(groups, aggs, child)) agg.copy(groupingExpressions = mayResolveAttrByAggregateExprs(groups, aggs, child))
} }

View file

@ -206,6 +206,17 @@ object SQLConf {
.intConf .intConf
.createWithDefault(100) .createWithDefault(100)
val ANSI_ENABLED = buildConf("spark.sql.ansi.enabled")
.doc("When true, Spark SQL uses an ANSI compliant dialect instead of being Hive compliant. " +
"For example, Spark will throw an exception at runtime instead of returning null results " +
"when the inputs to a SQL operator/function are invalid." +
"For full details of this dialect, you can find them in the section \"ANSI Compliance\" of " +
"Spark's documentation. Some ANSI dialect features may be not from the ANSI SQL " +
"standard directly, but their behaviors align with ANSI SQL's style")
.version("3.0.0")
.booleanConf
.createWithDefault(false)
val OPTIMIZER_EXCLUDED_RULES = buildConf("spark.sql.optimizer.excludedRules") val OPTIMIZER_EXCLUDED_RULES = buildConf("spark.sql.optimizer.excludedRules")
.doc("Configures a list of rules to be disabled in the optimizer, in which the rules are " + .doc("Configures a list of rules to be disabled in the optimizer, in which the rules are " +
"specified by their rule names and separated by comma. It is not guaranteed that all the " + "specified by their rule names and separated by comma. It is not guaranteed that all the " +
@ -1092,8 +1103,9 @@ object SQLConf {
.createWithDefault(true) .createWithDefault(true)
val GROUP_BY_ALIASES = buildConf("spark.sql.groupByAliases") val GROUP_BY_ALIASES = buildConf("spark.sql.groupByAliases")
.doc("When true, aliases in a select list can be used in group by clauses. When false, " + .doc("This configuration is only effective when ANSI mode is disabled. When it is true and " +
"an analysis exception is thrown in the case.") s"${ANSI_ENABLED.key} is false, aliases in a select list can be used in group by clauses. " +
"Otherwise, an analysis exception is thrown in the case.")
.version("2.2.0") .version("2.2.0")
.booleanConf .booleanConf
.createWithDefault(true) .createWithDefault(true)
@ -2348,17 +2360,6 @@ object SQLConf {
.checkValues(StoreAssignmentPolicy.values.map(_.toString)) .checkValues(StoreAssignmentPolicy.values.map(_.toString))
.createWithDefault(StoreAssignmentPolicy.ANSI.toString) .createWithDefault(StoreAssignmentPolicy.ANSI.toString)
val ANSI_ENABLED = buildConf("spark.sql.ansi.enabled")
.doc("When true, Spark SQL uses an ANSI compliant dialect instead of being Hive compliant. " +
"For example, Spark will throw an exception at runtime instead of returning null results " +
"when the inputs to a SQL operator/function are invalid." +
"For full details of this dialect, you can find them in the section \"ANSI Compliance\" of " +
"Spark's documentation. Some ANSI dialect features may be not from the ANSI SQL " +
"standard directly, but their behaviors align with ANSI SQL's style")
.version("3.0.0")
.booleanConf
.createWithDefault(false)
val SORT_BEFORE_REPARTITION = val SORT_BEFORE_REPARTITION =
buildConf("spark.sql.execution.sortBeforeRepartition") buildConf("spark.sql.execution.sortBeforeRepartition")
.internal() .internal()

View file

@ -0,0 +1 @@
--IMPORT group-analytics.sql

File diff suppressed because it is too large Load diff