From 21232377ba654babd0dc929b5278835beefcc6a1 Mon Sep 17 00:00:00 2001 From: Angerszhuuuu Date: Mon, 12 Apr 2021 08:23:52 +0000 Subject: [PATCH] [SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics ### What changes were proposed in this pull request? Support GROUP BY use Separate columns and CUBE/ROLLUP In postgres sql, it support ``` select a, b, c, count(1) from t group by a, b, cube (a, b, c); select a, b, c, count(1) from t group by a, b, rollup(a, b, c); select a, b, c, count(1) from t group by cube(a, b), rollup (a, b, c); select a, b, c, count(1) from t group by a, b, grouping sets((a, b), (a), ()); ``` In this pr, we have done two things as below: 1. Support partial grouping analytics such as `group by a, cube(a, b)` 2. Support mixed grouping analytics such as `group by cube(a, b), rollup(b,c)` *Partial Groupings* Partial Groupings means there are both `group_expression` and `CUBE|ROLLUP|GROUPING SETS` in GROUP BY clause. For example: `GROUP BY warehouse, CUBE(product, location)` is equivalent to `GROUP BY GROUPING SETS((warehouse, product, location), (warehouse, product), (warehouse, location), (warehouse))`. `GROUP BY warehouse, ROLLUP(product, location)` is equivalent to `GROUP BY GROUPING SETS((warehouse, product, location), (warehouse, product), (warehouse))`. `GROUP BY warehouse, GROUPING SETS((product, location), (producet), ())` is equivalent to `GROUP BY GROUPING SETS((warehouse, product, location), (warehouse, location), (warehouse))`. *Concatenated Groupings* Concatenated groupings offer a concise way to generate useful combinations of groupings. Groupings specified with concatenated groupings yield the cross-product of groupings from each grouping set. The cross-product operation enables even a small number of concatenated groupings to generate a large number of final groups. The concatenated groupings are specified simply by listing multiple `GROUPING SETS`, `CUBES`, and `ROLLUP`, and separating them with commas. For example: `GROUP BY GROUPING SETS((warehouse), (producet)), GROUPING SETS((location), (size))` is equivalent to `GROUP BY GROUPING SETS((warehouse, location), (warehouse, size), (product, location), (product, size))`. `GROUP BY CUBE((warehouse), (producet)), ROLLUP((location), (size))` is equivalent to `GROUP BY GROUPING SETS((warehouse, product), (warehouse), (producet), ()), GROUPING SETS((location, size), (location), ())` `GROUP BY GROUPING SETS( (warehouse, product, location, size), (warehouse, product, location), (warehouse, product), (warehouse, location, size), (warehouse, location), (warehouse), (product, location, size), (product, location), (product), (location, size), (location), ())`. `GROUP BY order, CUBE((warehouse), (producet)), ROLLUP((location), (size))` is equivalent to `GROUP BY order, GROUPING SETS((warehouse, product), (warehouse), (producet), ()), GROUPING SETS((location, size), (location), ())` `GROUP BY GROUPING SETS( (order, warehouse, product, location, size), (order, warehouse, product, location), (order, warehouse, product), (order, warehouse, location, size), (order, warehouse, location), (order, warehouse), (order, product, location, size), (order, product, location), (order, product), (order, location, size), (order, location), (order))`. ### Why are the changes needed? Support more flexible grouping analytics ### Does this PR introduce _any_ user-facing change? User can use sql like ``` select a, b, c, agg_expr() from table group by a, cube(b, c) ``` ### How was this patch tested? Added UT Closes #30144 from AngersZhuuuu/SPARK-33229. Lead-authored-by: Angerszhuuuu Co-authored-by: angerszhu Co-authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- docs/sql-ref-syntax-qry-select-groupby.md | 25 +- .../sql/catalyst/analysis/Analyzer.scala | 36 +- .../sql/catalyst/expressions/grouping.scala | 55 +- .../sql/catalyst/parser/AstBuilder.scala | 12 - .../sql-tests/inputs/group-analytics.sql | 12 +- .../sql-tests/results/group-analytics.sql.out | 613 +++++++++++++++++- 6 files changed, 684 insertions(+), 69 deletions(-) diff --git a/docs/sql-ref-syntax-qry-select-groupby.md b/docs/sql-ref-syntax-qry-select-groupby.md index 7d15de4d61..b81a5e43d5 100644 --- a/docs/sql-ref-syntax-qry-select-groupby.md +++ b/docs/sql-ref-syntax-qry-select-groupby.md @@ -24,7 +24,9 @@ license: | The `GROUP BY` clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple aggregations for the same input record set via `GROUPING SETS`, `CUBE`, `ROLLUP` clauses. -When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function. +The grouping expressions and advanced aggregations can be mixed in the `GROUP BY` clause. +See more details in the `Mixed Grouping Analytics` section. When a FILTER clause is attached to +an aggregate function, only the matching rows are passed to that function. ### Syntax @@ -41,7 +43,7 @@ aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_ex ### Parameters -* **grouping_expression** +* **group_expression** Specifies the criteria based on which the rows are grouped together. The grouping of rows is performed based on result values of the grouping expressions. A grouping expression may be a column name like `GROUP BY a`, a column position like @@ -93,6 +95,25 @@ aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_ex (product, warehouse, location), (warehouse), (product), (warehouse, product), ())`. The N elements of a `CUBE` specification results in 2^N `GROUPING SETS`. +* **Mixed Grouping Analytics** + + A GROUP BY clause can include multiple `group_expression`s and multiple `CUBE|ROLLUP|GROUPING SETS`s. + `CUBE|ROLLUP` is just a syntax sugar for `GROUPING SETS`, please refer to the sections above for + how to translate `CUBE|ROLLUP` to `GROUPING SETS`. `group_expression` can be treated as a single-group + `GROUPING SETS` under this context. For multiple `GROUPING SETS` in the `GROUP BY` clause, we generate + a single `GROUPING SETS` by doing a cross-product of the original `GROUPING SETS`s. For example, + `GROUP BY warehouse, GROUPING SETS((product), ()), GROUPING SETS((location, size), (location), (size), ())` + and `GROUP BY warehouse, ROLLUP(product), CUBE(location, size)` is equivalent to + `GROUP BY GROUPING SETS( + (warehouse, product, location, size), + (warehouse, product, location), + (warehouse, product, size), + (warehouse, product), + (warehouse, location, size), + (warehouse, location), + (warehouse, size), + (warehouse))`. + * **aggregate_name** Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.). diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 238720a21c..d41a638f55 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -595,13 +595,14 @@ class Analyzer(override val catalogManager: CatalogManager) } } - private def tryResolveHavingCondition(h: UnresolvedHaving): LogicalPlan = { - val aggForResolving = h.child match { - // For CUBE/ROLLUP expressions, to avoid resolving repeatedly, here we delete them from - // groupingExpressions for condition resolving. - case a @ Aggregate(Seq(gs: BaseGroupingSets), _, _) => - a.copy(groupingExpressions = gs.groupByExprs) - } + private def tryResolveHavingCondition( + h: UnresolvedHaving, + aggregate: Aggregate, + selectedGroupByExprs: Seq[Seq[Expression]], + groupByExprs: Seq[Expression]): LogicalPlan = { + // For CUBE/ROLLUP expressions, to avoid resolving repeatedly, here we delete them from + // groupingExpressions for condition resolving. + val aggForResolving = aggregate.copy(groupingExpressions = groupByExprs) // Try resolving the condition of the filter as though it is in the aggregate clause val resolvedInfo = ResolveAggregateFunctions.resolveFilterCondInAggregate(h.havingCondition, aggForResolving) @@ -609,12 +610,8 @@ class Analyzer(override val catalogManager: CatalogManager) // Push the aggregate expressions into the aggregate (if any). if (resolvedInfo.nonEmpty) { val (extraAggExprs, resolvedHavingCond) = resolvedInfo.get - val newChild = h.child match { - case Aggregate(Seq(gs: BaseGroupingSets), aggregateExpressions, child) => - constructAggregate( - gs.selectedGroupByExprs, gs.groupByExprs, - aggregateExpressions ++ extraAggExprs, child) - } + val newChild = constructAggregate(selectedGroupByExprs, groupByExprs, + aggregate.aggregateExpressions ++ extraAggExprs, aggregate.child) // Since the exprId of extraAggExprs will be changed in the constructed aggregate, and the // aggregateExpressions keeps the input order. So here we build an exprMap to resolve the @@ -636,16 +633,17 @@ class Analyzer(override val catalogManager: CatalogManager) // CUBE/ROLLUP/GROUPING SETS. This also replace grouping()/grouping_id() in resolved // Filter/Sort. def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsDown { - case h @ UnresolvedHaving(_, agg @ Aggregate(Seq(gs: BaseGroupingSets), aggExprs, _)) - if agg.childrenResolved && (gs.children ++ aggExprs).forall(_.resolved) => - tryResolveHavingCondition(h) + case h @ UnresolvedHaving(_, agg @ Aggregate( + GroupingAnalytics(selectedGroupByExprs, groupByExprs), aggExprs, _)) + if agg.childrenResolved && aggExprs.forall(_.resolved) => + tryResolveHavingCondition(h, agg, selectedGroupByExprs, groupByExprs) case a if !a.childrenResolved => a // be sure all of the children are resolved. // Ensure group by expressions and aggregate expressions have been resolved. - case Aggregate(Seq(gs: BaseGroupingSets), aggregateExpressions, child) - if (gs.children ++ aggregateExpressions).forall(_.resolved) => - constructAggregate(gs.selectedGroupByExprs, gs.groupByExprs, aggregateExpressions, child) + case Aggregate(GroupingAnalytics(selectedGroupByExprs, groupByExprs), aggExprs, child) + if aggExprs.forall(_.resolved) => + constructAggregate(selectedGroupByExprs, groupByExprs, aggExprs, child) // We should make sure all expressions in condition have been resolved. case f @ Filter(cond, child) if hasGroupingFunction(cond) && cond.resolved => diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala index 0dd82bed15..0f14203a90 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala @@ -34,16 +34,7 @@ trait BaseGroupingSets extends Expression with CodegenFallback { def groupByExprs: Seq[Expression] = { assert(children.forall(_.resolved), "Cannot call BaseGroupingSets.groupByExprs before the children expressions are all resolved.") - children.foldLeft(Seq.empty[Expression]) { (result, currentExpr) => - // Only unique expressions are included in the group by expressions and is determined - // based on their semantic equality. Example. grouping sets ((a * b), (b * a)) results - // in grouping expression (a * b) - if (result.exists(_.semanticEquals(currentExpr))) { - result - } else { - result :+ currentExpr - } - } + BaseGroupingSets.distinctGroupByExprs(children) } // this should be replaced first @@ -104,6 +95,19 @@ object BaseGroupingSets { case (gs, startOffset) => gs.indices.map(_ + startOffset) } } + + def distinctGroupByExprs(exprs: Seq[Expression]): Seq[Expression] = { + exprs.foldLeft(Seq.empty[Expression]) { (result, currentExpr) => + // Only unique expressions are included in the group by expressions and is determined + // based on their semantic equality. Example. grouping sets ((a * b), (b * a)) results + // in grouping expression (a * b) + if (result.exists(_.semanticEquals(currentExpr))) { + result + } else { + result :+ currentExpr + } + } + } } case class Cube( @@ -242,3 +246,34 @@ object GroupingID { if (SQLConf.get.integerGroupingIdEnabled) IntegerType else LongType } } + +object GroupingAnalytics { + def unapply(exprs: Seq[Expression]) + : Option[(Seq[Seq[Expression]], Seq[Expression])] = { + if (!exprs.exists(_.isInstanceOf[BaseGroupingSets])) { + None + } else { + val resolved = exprs.forall { + case gs: BaseGroupingSets => gs.childrenResolved + case other => other.resolved + } + if (!resolved) { + None + } else { + val groups = exprs.flatMap { + case gs: BaseGroupingSets => gs.groupByExprs + case other: Expression => other :: Nil + } + val unmergedSelectedGroupByExprs = exprs.map { + case gs: BaseGroupingSets => gs.selectedGroupByExprs + case other: Expression => Seq(Seq(other)) + } + val selectedGroupByExprs = unmergedSelectedGroupByExprs.tail + .foldLeft(unmergedSelectedGroupByExprs.head) { (x, y) => + for (a <- x; b <- y) yield a ++ b + } + Some(selectedGroupByExprs, BaseGroupingSets.distinctGroupByExprs(groups)) + } + } + } +} diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index a788f233af..faec5d9743 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -966,18 +966,6 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg expression(groupByExpr.expression) } }) - val (groupingSet, expressions) = - groupByExpressions.partition(_.isInstanceOf[BaseGroupingSets]) - if (expressions.nonEmpty && groupingSet.nonEmpty) { - throw new ParseException("Partial CUBE/ROLLUP/GROUPING SETS like " + - "`GROUP BY a, b, CUBE(a, b)` is not supported.", - ctx) - } - if (groupingSet.size > 1) { - throw new ParseException("Mixed CUBE/ROLLUP/GROUPING SETS like " + - "`GROUP BY CUBE(a, b), ROLLUP(a, c)` is not supported.", - ctx) - } Aggregate(groupByExpressions.toSeq, selectExpressions, query) } } diff --git a/sql/core/src/test/resources/sql-tests/inputs/group-analytics.sql b/sql/core/src/test/resources/sql-tests/inputs/group-analytics.sql index fe9cadb7fb..6dfe31e270 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/group-analytics.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/group-analytics.sql @@ -69,4 +69,14 @@ SELECT course, year FROM courseSales GROUP BY CUBE(course, year) ORDER BY groupi -- Aliases in SELECT could be used in ROLLUP/CUBE/GROUPING SETS SELECT a + b AS k1, b AS k2, SUM(a - b) FROM testData GROUP BY CUBE(k1, k2); SELECT a + b AS k, b, SUM(a - b) FROM testData GROUP BY ROLLUP(k, b); -SELECT a + b, b AS k, SUM(a - b) FROM testData GROUP BY a + b, k GROUPING SETS(k) +SELECT a + b, b AS k, SUM(a - b) FROM testData GROUP BY a + b, k GROUPING SETS(k); + +-- GROUP BY use mixed Separate columns and CUBE/ROLLUP/Gr +SELECT a, b, count(1) FROM testData GROUP BY a, b, CUBE(a, b); +SELECT a, b, count(1) FROM testData GROUP BY a, b, ROLLUP(a, b); +SELECT a, b, count(1) FROM testData GROUP BY CUBE(a, b), ROLLUP(a, b); +SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(b); +SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS((a, b), (a), ()); +SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), GROUPING SETS((a, b), (a), ()); +SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(a, b), GROUPING SETS((a, b), (a), ()); + diff --git a/sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out b/sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out index 307efcf063..6dc02ead9d 100644 --- a/sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 37 +-- Number of queries: 44 -- !query @@ -248,43 +248,155 @@ NULL 2013 78000 -- !query SELECT course, year, SUM(earnings) FROM courseSales GROUP BY course, CUBE(course, year) ORDER BY course, year -- !query schema -struct<> +struct -- !query output -org.apache.spark.sql.catalyst.parser.ParseException - -Partial CUBE/ROLLUP/GROUPING SETS like `GROUP BY a, b, CUBE(a, b)` is not supported.(line 1, pos 52) - -== SQL == -SELECT course, year, SUM(earnings) FROM courseSales GROUP BY course, CUBE(course, year) ORDER BY course, year -----------------------------------------------------^^^ +Java NULL 50000 +Java NULL 50000 +Java 2012 20000 +Java 2012 20000 +Java 2013 30000 +Java 2013 30000 +dotNET NULL 63000 +dotNET NULL 63000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2013 48000 +dotNET 2013 48000 -- !query SELECT course, year, SUM(earnings) FROM courseSales GROUP BY CUBE(course, year), ROLLUP(course, year) ORDER BY course, year -- !query schema -struct<> +struct -- !query output -org.apache.spark.sql.catalyst.parser.ParseException - -Mixed CUBE/ROLLUP/GROUPING SETS like `GROUP BY CUBE(a, b), ROLLUP(a, c)` is not supported.(line 1, pos 52) - -== SQL == -SELECT course, year, SUM(earnings) FROM courseSales GROUP BY CUBE(course, year), ROLLUP(course, year) ORDER BY course, year -----------------------------------------------------^^^ +NULL NULL 113000 +NULL 2012 35000 +NULL 2013 78000 +Java NULL 50000 +Java NULL 50000 +Java NULL 50000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +dotNET NULL 63000 +dotNET NULL 63000 +dotNET NULL 63000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 -- !query SELECT course, year, SUM(earnings) FROM courseSales GROUP BY CUBE(course, year), ROLLUP(course, year), GROUPING SETS(course, year) ORDER BY course, year -- !query schema -struct<> +struct -- !query output -org.apache.spark.sql.catalyst.parser.ParseException - -Mixed CUBE/ROLLUP/GROUPING SETS like `GROUP BY CUBE(a, b), ROLLUP(a, c)` is not supported.(line 1, pos 52) - -== SQL == -SELECT course, year, SUM(earnings) FROM courseSales GROUP BY CUBE(course, year), ROLLUP(course, year), GROUPING SETS(course, year) ORDER BY course, year -----------------------------------------------------^^^ +NULL 2012 35000 +NULL 2012 35000 +NULL 2013 78000 +NULL 2013 78000 +Java NULL 50000 +Java NULL 50000 +Java NULL 50000 +Java NULL 50000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2012 20000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +Java 2013 30000 +dotNET NULL 63000 +dotNET NULL 63000 +dotNET NULL 63000 +dotNET NULL 63000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2012 15000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 +dotNET 2013 48000 -- !query @@ -524,3 +636,454 @@ struct<(a + b):int,k:int,sum((a - b)):bigint> -- !query output NULL 1 3 NULL 2 0 + + +-- !query +SELECT a, b, count(1) FROM testData GROUP BY a, b, CUBE(a, b) +-- !query schema +struct +-- !query output +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 + + +-- !query +SELECT a, b, count(1) FROM testData GROUP BY a, b, ROLLUP(a, b) +-- !query schema +struct +-- !query output +1 1 1 +1 1 1 +1 1 1 +1 2 1 +1 2 1 +1 2 1 +2 1 1 +2 1 1 +2 1 1 +2 2 1 +2 2 1 +2 2 1 +3 1 1 +3 1 1 +3 1 1 +3 2 1 +3 2 1 +3 2 1 + + +-- !query +SELECT a, b, count(1) FROM testData GROUP BY CUBE(a, b), ROLLUP(a, b) +-- !query schema +struct +-- !query output +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 NULL 2 +1 NULL 2 +1 NULL 2 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 NULL 2 +2 NULL 2 +2 NULL 2 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 NULL 2 +3 NULL 2 +3 NULL 2 +NULL 1 3 +NULL 2 3 +NULL NULL 6 + + +-- !query +SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(b) +-- !query schema +struct +-- !query output +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 NULL 2 +1 NULL 2 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 NULL 2 +2 NULL 2 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 NULL 2 +3 NULL 2 + + +-- !query +SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS((a, b), (a), ()) +-- !query schema +struct +-- !query output +1 1 1 +1 2 1 +1 NULL 2 +1 NULL 2 +2 1 1 +2 2 1 +2 NULL 2 +2 NULL 2 +3 1 1 +3 2 1 +3 NULL 2 +3 NULL 2 + + +-- !query +SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), GROUPING SETS((a, b), (a), ()) +-- !query schema +struct +-- !query output +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 NULL 2 +1 NULL 2 +1 NULL 2 +1 NULL 2 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 NULL 2 +2 NULL 2 +2 NULL 2 +2 NULL 2 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 NULL 2 +3 NULL 2 +3 NULL 2 +3 NULL 2 + + +-- !query +SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(a, b), GROUPING SETS((a, b), (a), ()) +-- !query schema +struct +-- !query output +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 1 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 2 1 +1 NULL 2 +1 NULL 2 +1 NULL 2 +1 NULL 2 +1 NULL 2 +1 NULL 2 +1 NULL 2 +1 NULL 2 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 1 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 2 1 +2 NULL 2 +2 NULL 2 +2 NULL 2 +2 NULL 2 +2 NULL 2 +2 NULL 2 +2 NULL 2 +2 NULL 2 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 1 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 2 1 +3 NULL 2 +3 NULL 2 +3 NULL 2 +3 NULL 2 +3 NULL 2 +3 NULL 2 +3 NULL 2 +3 NULL 2