[SPARK-35028][SQL] ANSI mode: disallow group by aliases
### What changes were proposed in this pull request? Disallow group by aliases under ANSI mode. ### Why are the changes needed? As per the ANSI SQL standard secion 7.12 <group by clause>: >Each `grouping column reference` shall unambiguously reference a column of the table resulting from the `from clause`. A column referenced in a `group by clause` is a grouping column. By forbidding it, we can avoid ambiguous SQL queries like: ``` SELECT col + 1 as col FROM t GROUP BY col ``` ### Does this PR introduce _any_ user-facing change? Yes, group by aliases is not allowed under ANSI mode. ### How was this patch tested? Unit tests Closes #32129 from gengliangwang/disallowGroupByAlias. Authored-by: Gengliang Wang <ltnwgl@gmail.com> Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
This commit is contained in:
parent
278203d969
commit
79e55b44f7
|
@ -183,6 +183,7 @@ The behavior of some SQL functions can be different under ANSI mode (`spark.sql.
|
|||
The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`).
|
||||
- `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices.
|
||||
- `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map.
|
||||
- `GROUP BY`: aliases in a select list can not be used in GROUP BY clauses. Each column referenced in a GROUP BY clause shall unambiguously reference a column of the table resulting from the FROM clause.
|
||||
|
||||
### SQL Keywords
|
||||
|
||||
|
|
|
@ -1847,9 +1847,12 @@ class Analyzer(override val catalogManager: CatalogManager)
|
|||
}}
|
||||
}
|
||||
|
||||
// Group by alias is not allowed in ANSI mode.
|
||||
private def allowGroupByAlias: Boolean = conf.groupByAliases && !conf.ansiEnabled
|
||||
|
||||
override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
|
||||
case agg @ Aggregate(groups, aggs, child)
|
||||
if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) &&
|
||||
if allowGroupByAlias && child.resolved && aggs.forall(_.resolved) &&
|
||||
groups.exists(!_.resolved) =>
|
||||
agg.copy(groupingExpressions = mayResolveAttrByAggregateExprs(groups, aggs, child))
|
||||
}
|
||||
|
|
|
@ -206,6 +206,17 @@ object SQLConf {
|
|||
.intConf
|
||||
.createWithDefault(100)
|
||||
|
||||
val ANSI_ENABLED = buildConf("spark.sql.ansi.enabled")
|
||||
.doc("When true, Spark SQL uses an ANSI compliant dialect instead of being Hive compliant. " +
|
||||
"For example, Spark will throw an exception at runtime instead of returning null results " +
|
||||
"when the inputs to a SQL operator/function are invalid." +
|
||||
"For full details of this dialect, you can find them in the section \"ANSI Compliance\" of " +
|
||||
"Spark's documentation. Some ANSI dialect features may be not from the ANSI SQL " +
|
||||
"standard directly, but their behaviors align with ANSI SQL's style")
|
||||
.version("3.0.0")
|
||||
.booleanConf
|
||||
.createWithDefault(false)
|
||||
|
||||
val OPTIMIZER_EXCLUDED_RULES = buildConf("spark.sql.optimizer.excludedRules")
|
||||
.doc("Configures a list of rules to be disabled in the optimizer, in which the rules are " +
|
||||
"specified by their rule names and separated by comma. It is not guaranteed that all the " +
|
||||
|
@ -1092,8 +1103,9 @@ object SQLConf {
|
|||
.createWithDefault(true)
|
||||
|
||||
val GROUP_BY_ALIASES = buildConf("spark.sql.groupByAliases")
|
||||
.doc("When true, aliases in a select list can be used in group by clauses. When false, " +
|
||||
"an analysis exception is thrown in the case.")
|
||||
.doc("This configuration is only effective when ANSI mode is disabled. When it is true and " +
|
||||
s"${ANSI_ENABLED.key} is false, aliases in a select list can be used in group by clauses. " +
|
||||
"Otherwise, an analysis exception is thrown in the case.")
|
||||
.version("2.2.0")
|
||||
.booleanConf
|
||||
.createWithDefault(true)
|
||||
|
@ -2348,17 +2360,6 @@ object SQLConf {
|
|||
.checkValues(StoreAssignmentPolicy.values.map(_.toString))
|
||||
.createWithDefault(StoreAssignmentPolicy.ANSI.toString)
|
||||
|
||||
val ANSI_ENABLED = buildConf("spark.sql.ansi.enabled")
|
||||
.doc("When true, Spark SQL uses an ANSI compliant dialect instead of being Hive compliant. " +
|
||||
"For example, Spark will throw an exception at runtime instead of returning null results " +
|
||||
"when the inputs to a SQL operator/function are invalid." +
|
||||
"For full details of this dialect, you can find them in the section \"ANSI Compliance\" of " +
|
||||
"Spark's documentation. Some ANSI dialect features may be not from the ANSI SQL " +
|
||||
"standard directly, but their behaviors align with ANSI SQL's style")
|
||||
.version("3.0.0")
|
||||
.booleanConf
|
||||
.createWithDefault(false)
|
||||
|
||||
val SORT_BEFORE_REPARTITION =
|
||||
buildConf("spark.sql.execution.sortBeforeRepartition")
|
||||
.internal()
|
||||
|
|
|
@ -0,0 +1 @@
|
|||
--IMPORT group-analytics.sql
|
File diff suppressed because it is too large
Load diff
Loading…
Reference in a new issue