[SPARK-33229][SQL] Support partial grouping analytics and concatenated grouping analytics

### What changes were proposed in this pull request? Support GROUP BY use Separate columns and CUBE/ROLLUP In postgres sql, it support ``` select a, b, c, count(1) from t group by a, b, cube (a, b, c); select a, b, c, count(1) from t group by a, b, rollup(a, b, c); select a, b, c, count(1) from t group by cube(a, b), rollup (a, b, c); select a, b, c, count(1) from t group by a, b, grouping sets((a, b), (a), ()); ``` In this pr, we have done two things as below: 1. Support partial grouping analytics such as `group by a, cube(a, b)` 2. Support mixed grouping analytics such as `group by cube(a, b), rollup(b,c)` *Partial Groupings* Partial Groupings means there are both `group_expression` and `CUBE|ROLLUP|GROUPING SETS` in GROUP BY clause. For example: `GROUP BY warehouse, CUBE(product, location)` is equivalent to `GROUP BY GROUPING SETS((warehouse, product, location), (warehouse, product), (warehouse, location), (warehouse))`. `GROUP BY warehouse, ROLLUP(product, location)` is equivalent to `GROUP BY GROUPING SETS((warehouse, product, location), (warehouse, product), (warehouse))`. `GROUP BY warehouse, GROUPING SETS((product, location), (producet), ())` is equivalent to `GROUP BY GROUPING SETS((warehouse, product, location), (warehouse, location), (warehouse))`. *Concatenated Groupings* Concatenated groupings offer a concise way to generate useful combinations of groupings. Groupings specified with concatenated groupings yield the cross-product of groupings from each grouping set. The cross-product operation enables even a small number of concatenated groupings to generate a large number of final groups. The concatenated groupings are specified simply by listing multiple `GROUPING SETS`, `CUBES`, and `ROLLUP`, and separating them with commas. For example: `GROUP BY GROUPING SETS((warehouse), (producet)), GROUPING SETS((location), (size))` is equivalent to `GROUP BY GROUPING SETS((warehouse, location), (warehouse, size), (product, location), (product, size))`. `GROUP BY CUBE((warehouse), (producet)), ROLLUP((location), (size))` is equivalent to `GROUP BY GROUPING SETS((warehouse, product), (warehouse), (producet), ()), GROUPING SETS((location, size), (location), ())` `GROUP BY GROUPING SETS( (warehouse, product, location, size), (warehouse, product, location), (warehouse, product), (warehouse, location, size), (warehouse, location), (warehouse), (product, location, size), (product, location), (product), (location, size), (location), ())`. `GROUP BY order, CUBE((warehouse), (producet)), ROLLUP((location), (size))` is equivalent to `GROUP BY order, GROUPING SETS((warehouse, product), (warehouse), (producet), ()), GROUPING SETS((location, size), (location), ())` `GROUP BY GROUPING SETS( (order, warehouse, product, location, size), (order, warehouse, product, location), (order, warehouse, product), (order, warehouse, location, size), (order, warehouse, location), (order, warehouse), (order, product, location, size), (order, product, location), (order, product), (order, location, size), (order, location), (order))`. ### Why are the changes needed? Support more flexible grouping analytics ### Does this PR introduce _any_ user-facing change? User can use sql like ``` select a, b, c, agg_expr() from table group by a, cube(b, c) ``` ### How was this patch tested? Added UT Closes #30144 from AngersZhuuuu/SPARK-33229. Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com> Co-authored-by: angerszhu <angers.zhu@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-04-12 08:23:52 +00:00 · 2021-04-12 08:23:52 +00:00 · 21232377ba
parent 3db8ec258c
commit 21232377ba
6 changed files with 684 additions and 69 deletions
--- a/docs/sql-ref-syntax-qry-select-groupby.md
+++ b/docs/sql-ref-syntax-qry-select-groupby.md
@ -24,7 +24,9 @@ license: |
 The `GROUP BY` clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on
 the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple
 aggregations for the same input record set via `GROUPING SETS`, `CUBE`, `ROLLUP` clauses.
-When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
+The grouping expressions and advanced aggregations can be mixed in the `GROUP BY` clause.
+See more details in the `Mixed Grouping Analytics` section. When a FILTER clause is attached to
+an aggregate function, only the matching rows are passed to that function.

 ### Syntax

@ -41,7 +43,7 @@ aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_ex

 ### Parameters

-* **grouping_expression**
+* **group_expression**

    Specifies the criteria based on which the rows are grouped together. The grouping of rows is performed based on
    result values of the grouping expressions. A grouping expression may be a column name like `GROUP BY a`, a column position like
@ -93,6 +95,25 @@ aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_ex
     (product, warehouse, location), (warehouse), (product), (warehouse, product), ())`.
    The N elements of a `CUBE` specification results in 2^N `GROUPING SETS`.

+* **Mixed Grouping Analytics**
+
+    A GROUP BY clause can include multiple  `group_expression`s and multiple `CUBE|ROLLUP|GROUPING SETS`s.
+    `CUBE|ROLLUP` is just a syntax sugar for `GROUPING SETS`, please refer to the sections above for
+    how to translate `CUBE|ROLLUP` to `GROUPING SETS`. `group_expression` can be treated as a single-group
+    `GROUPING SETS` under this context. For multiple `GROUPING SETS` in the `GROUP BY` clause, we generate
+    a single `GROUPING SETS` by doing a cross-product of the original `GROUPING SETS`s. For example,
+    `GROUP BY warehouse, GROUPING SETS((product), ()), GROUPING SETS((location, size), (location), (size), ())`
+    and `GROUP BY warehouse, ROLLUP(product), CUBE(location, size)` is equivalent to 
+    `GROUP BY GROUPING SETS(
+        (warehouse, product, location, size), 
+        (warehouse, product, location),
+        (warehouse, product, size), 
+        (warehouse, product),
+        (warehouse, location, size),
+        (warehouse, location),
+        (warehouse, size),
+        (warehouse))`.
+ 
 * **aggregate_name**

    Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.).
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@ -595,13 +595,14 @@ class Analyzer(override val catalogManager: CatalogManager)
      }
    }

-    private def tryResolveHavingCondition(h: UnresolvedHaving): LogicalPlan = {
-      val aggForResolving = h.child match {
-        // For CUBE/ROLLUP expressions, to avoid resolving repeatedly, here we delete them from
-        // groupingExpressions for condition resolving.
-        case a @ Aggregate(Seq(gs: BaseGroupingSets), _, _) =>
-          a.copy(groupingExpressions = gs.groupByExprs)
-      }
+    private def tryResolveHavingCondition(
+        h: UnresolvedHaving,
+        aggregate: Aggregate,
+        selectedGroupByExprs: Seq[Seq[Expression]],
+        groupByExprs: Seq[Expression]): LogicalPlan = {
+      // For CUBE/ROLLUP expressions, to avoid resolving repeatedly, here we delete them from
+      // groupingExpressions for condition resolving.
+      val aggForResolving = aggregate.copy(groupingExpressions = groupByExprs)
      // Try resolving the condition of the filter as though it is in the aggregate clause
      val resolvedInfo =
        ResolveAggregateFunctions.resolveFilterCondInAggregate(h.havingCondition, aggForResolving)
@ -609,12 +610,8 @@ class Analyzer(override val catalogManager: CatalogManager)
      // Push the aggregate expressions into the aggregate (if any).
      if (resolvedInfo.nonEmpty) {
        val (extraAggExprs, resolvedHavingCond) = resolvedInfo.get
-        val newChild = h.child match {
-          case Aggregate(Seq(gs: BaseGroupingSets), aggregateExpressions, child) =>
-            constructAggregate(
-              gs.selectedGroupByExprs, gs.groupByExprs,
-              aggregateExpressions ++ extraAggExprs, child)
-        }
+        val newChild = constructAggregate(selectedGroupByExprs, groupByExprs,
+          aggregate.aggregateExpressions ++ extraAggExprs, aggregate.child)

        // Since the exprId of extraAggExprs will be changed in the constructed aggregate, and the
        // aggregateExpressions keeps the input order. So here we build an exprMap to resolve the
@ -636,16 +633,17 @@ class Analyzer(override val catalogManager: CatalogManager)
    // CUBE/ROLLUP/GROUPING SETS. This also replace grouping()/grouping_id() in resolved
    // Filter/Sort.
    def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsDown {
-      case h @ UnresolvedHaving(_, agg @ Aggregate(Seq(gs: BaseGroupingSets), aggExprs, _))
-        if agg.childrenResolved && (gs.children ++ aggExprs).forall(_.resolved) =>
-        tryResolveHavingCondition(h)
+      case h @ UnresolvedHaving(_, agg @ Aggregate(
+        GroupingAnalytics(selectedGroupByExprs, groupByExprs), aggExprs, _))
+        if agg.childrenResolved && aggExprs.forall(_.resolved) =>
+        tryResolveHavingCondition(h, agg, selectedGroupByExprs, groupByExprs)

      case a if !a.childrenResolved => a // be sure all of the children are resolved.

      // Ensure group by expressions and aggregate expressions have been resolved.
-      case Aggregate(Seq(gs: BaseGroupingSets), aggregateExpressions, child)
-        if (gs.children ++ aggregateExpressions).forall(_.resolved) =>
-        constructAggregate(gs.selectedGroupByExprs, gs.groupByExprs, aggregateExpressions, child)
+      case Aggregate(GroupingAnalytics(selectedGroupByExprs, groupByExprs), aggExprs, child)
+        if aggExprs.forall(_.resolved) =>
+        constructAggregate(selectedGroupByExprs, groupByExprs, aggExprs, child)

      // We should make sure all expressions in condition have been resolved.
      case f @ Filter(cond, child) if hasGroupingFunction(cond) && cond.resolved =>
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala
@ -34,16 +34,7 @@ trait BaseGroupingSets extends Expression with CodegenFallback {
  def groupByExprs: Seq[Expression] = {
    assert(children.forall(_.resolved),
      "Cannot call BaseGroupingSets.groupByExprs before the children expressions are all resolved.")
-    children.foldLeft(Seq.empty[Expression]) { (result, currentExpr) =>
-      // Only unique expressions are included in the group by expressions and is determined
-      // based on their semantic equality. Example. grouping sets ((a * b), (b * a)) results
-      // in grouping expression (a * b)
-      if (result.exists(_.semanticEquals(currentExpr))) {
-        result
-      } else {
-        result :+ currentExpr
-      }
-    }
+    BaseGroupingSets.distinctGroupByExprs(children)
  }

  // this should be replaced first
@ -104,6 +95,19 @@ object BaseGroupingSets {
      case (gs, startOffset) => gs.indices.map(_ + startOffset)
    }
  }
+
+  def distinctGroupByExprs(exprs: Seq[Expression]): Seq[Expression] = {
+    exprs.foldLeft(Seq.empty[Expression]) { (result, currentExpr) =>
+      // Only unique expressions are included in the group by expressions and is determined
+      // based on their semantic equality. Example. grouping sets ((a * b), (b * a)) results
+      // in grouping expression (a * b)
+      if (result.exists(_.semanticEquals(currentExpr))) {
+        result
+      } else {
+        result :+ currentExpr
+      }
+    }
+  }
 }

 case class Cube(
@ -242,3 +246,34 @@ object GroupingID {
    if (SQLConf.get.integerGroupingIdEnabled) IntegerType else LongType
  }
 }
+
+object GroupingAnalytics {
+  def unapply(exprs: Seq[Expression])
+  : Option[(Seq[Seq[Expression]], Seq[Expression])] = {
+    if (!exprs.exists(_.isInstanceOf[BaseGroupingSets])) {
+      None
+    } else {
+      val resolved = exprs.forall {
+        case gs: BaseGroupingSets => gs.childrenResolved
+        case other => other.resolved
+      }
+      if (!resolved) {
+        None
+      } else {
+        val groups = exprs.flatMap {
+          case gs: BaseGroupingSets => gs.groupByExprs
+          case other: Expression => other :: Nil
+        }
+        val unmergedSelectedGroupByExprs = exprs.map {
+          case gs: BaseGroupingSets => gs.selectedGroupByExprs
+          case other: Expression => Seq(Seq(other))
+        }
+        val selectedGroupByExprs = unmergedSelectedGroupByExprs.tail
+          .foldLeft(unmergedSelectedGroupByExprs.head) { (x, y) =>
+            for (a <- x; b <- y) yield a ++ b
+          }
+        Some(selectedGroupByExprs, BaseGroupingSets.distinctGroupByExprs(groups))
+      }
+    }
+  }
+}
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@ -966,18 +966,6 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
              expression(groupByExpr.expression)
            }
          })
-      val (groupingSet, expressions) =
-        groupByExpressions.partition(_.isInstanceOf[BaseGroupingSets])
-      if (expressions.nonEmpty && groupingSet.nonEmpty) {
-        throw new ParseException("Partial CUBE/ROLLUP/GROUPING SETS like " +
-          "`GROUP BY a, b, CUBE(a, b)` is not supported.",
-          ctx)
-      }
-      if (groupingSet.size > 1) {
-        throw new ParseException("Mixed CUBE/ROLLUP/GROUPING SETS like " +
-          "`GROUP BY CUBE(a, b), ROLLUP(a, c)` is not supported.",
-          ctx)
-      }
      Aggregate(groupByExpressions.toSeq, selectExpressions, query)
    }
  }
--- a/sql/core/src/test/resources/sql-tests/inputs/group-analytics.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/group-analytics.sql
@ -69,4 +69,14 @@ SELECT course, year FROM courseSales GROUP BY CUBE(course, year) ORDER BY groupi
 -- Aliases in SELECT could be used in ROLLUP/CUBE/GROUPING SETS
 SELECT a + b AS k1, b AS k2, SUM(a - b) FROM testData GROUP BY CUBE(k1, k2);
 SELECT a + b AS k, b, SUM(a - b) FROM testData GROUP BY ROLLUP(k, b);
-SELECT a + b, b AS k, SUM(a - b) FROM testData GROUP BY a + b, k GROUPING SETS(k)
+SELECT a + b, b AS k, SUM(a - b) FROM testData GROUP BY a + b, k GROUPING SETS(k);
+
+-- GROUP BY use mixed Separate columns and CUBE/ROLLUP/Gr
+SELECT a, b, count(1) FROM testData GROUP BY a, b, CUBE(a, b);
+SELECT a, b, count(1) FROM testData GROUP BY a, b, ROLLUP(a, b);
+SELECT a, b, count(1) FROM testData GROUP BY CUBE(a, b), ROLLUP(a, b);
+SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(b);
+SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS((a, b), (a), ());
+SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), GROUPING SETS((a, b), (a), ());
+SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(a, b), GROUPING SETS((a, b), (a), ());
+
--- a/sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out
@ -1,5 +1,5 @@
 -- Automatically generated by SQLQueryTestSuite
-- Number of queries: 37
+-- Number of queries: 44


 -- !query
@ -248,43 +248,155 @@ NULL	2013	78000
 -- !query
 SELECT course, year, SUM(earnings) FROM courseSales GROUP BY course, CUBE(course, year) ORDER BY course, year
 -- !query schema
-struct<>
+struct<course:string,year:int,sum(earnings):bigint>
 -- !query output
-org.apache.spark.sql.catalyst.parser.ParseException
-
-Partial CUBE/ROLLUP/GROUPING SETS like `GROUP BY a, b, CUBE(a, b)` is not supported.(line 1, pos 52)
-
-== SQL ==
-SELECT course, year, SUM(earnings) FROM courseSales GROUP BY course, CUBE(course, year) ORDER BY course, year
----------------------------------------------------^^^
+Java	NULL	50000
+Java	NULL	50000
+Java	2012	20000
+Java	2012	20000
+Java	2013	30000
+Java	2013	30000
+dotNET	NULL	63000
+dotNET	NULL	63000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2013	48000
+dotNET	2013	48000


 -- !query
 SELECT course, year, SUM(earnings) FROM courseSales GROUP BY CUBE(course, year), ROLLUP(course, year) ORDER BY course, year
 -- !query schema
-struct<>
+struct<course:string,year:int,sum(earnings):bigint>
 -- !query output
-org.apache.spark.sql.catalyst.parser.ParseException
-
-Mixed CUBE/ROLLUP/GROUPING SETS like `GROUP BY CUBE(a, b), ROLLUP(a, c)` is not supported.(line 1, pos 52)
-
-== SQL ==
-SELECT course, year, SUM(earnings) FROM courseSales GROUP BY CUBE(course, year), ROLLUP(course, year) ORDER BY course, year
----------------------------------------------------^^^
+NULL	NULL	113000
+NULL	2012	35000
+NULL	2013	78000
+Java	NULL	50000
+Java	NULL	50000
+Java	NULL	50000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+dotNET	NULL	63000
+dotNET	NULL	63000
+dotNET	NULL	63000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000


 -- !query
 SELECT course, year, SUM(earnings) FROM courseSales GROUP BY CUBE(course, year), ROLLUP(course, year), GROUPING SETS(course, year) ORDER BY course, year
 -- !query schema
-struct<>
+struct<course:string,year:int,sum(earnings):bigint>
 -- !query output
-org.apache.spark.sql.catalyst.parser.ParseException
-
-Mixed CUBE/ROLLUP/GROUPING SETS like `GROUP BY CUBE(a, b), ROLLUP(a, c)` is not supported.(line 1, pos 52)
-
-== SQL ==
-SELECT course, year, SUM(earnings) FROM courseSales GROUP BY CUBE(course, year), ROLLUP(course, year), GROUPING SETS(course, year) ORDER BY course, year
----------------------------------------------------^^^
+NULL	2012	35000
+NULL	2012	35000
+NULL	2013	78000
+NULL	2013	78000
+Java	NULL	50000
+Java	NULL	50000
+Java	NULL	50000
+Java	NULL	50000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2012	20000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+Java	2013	30000
+dotNET	NULL	63000
+dotNET	NULL	63000
+dotNET	NULL	63000
+dotNET	NULL	63000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2012	15000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000
+dotNET	2013	48000


 -- !query
@ -524,3 +636,454 @@ struct<(a + b):int,k:int,sum((a - b)):bigint>
 -- !query output
 NULL	1	3
 NULL	2	0
+
+
+-- !query
+SELECT a, b, count(1) FROM testData GROUP BY a, b, CUBE(a, b)
+-- !query schema
+struct<a:int,b:int,count(1):bigint>
+-- !query output
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+
+
+-- !query
+SELECT a, b, count(1) FROM testData GROUP BY a, b, ROLLUP(a, b)
+-- !query schema
+struct<a:int,b:int,count(1):bigint>
+-- !query output
+1	1	1
+1	1	1
+1	1	1
+1	2	1
+1	2	1
+1	2	1
+2	1	1
+2	1	1
+2	1	1
+2	2	1
+2	2	1
+2	2	1
+3	1	1
+3	1	1
+3	1	1
+3	2	1
+3	2	1
+3	2	1
+
+
+-- !query
+SELECT a, b, count(1) FROM testData GROUP BY CUBE(a, b), ROLLUP(a, b)
+-- !query schema
+struct<a:int,b:int,count(1):bigint>
+-- !query output
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	NULL	2
+1	NULL	2
+1	NULL	2
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	NULL	2
+2	NULL	2
+2	NULL	2
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	NULL	2
+3	NULL	2
+3	NULL	2
+NULL	1	3
+NULL	2	3
+NULL	NULL	6
+
+
+-- !query
+SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(b)
+-- !query schema
+struct<a:int,b:int,count(1):bigint>
+-- !query output
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	NULL	2
+1	NULL	2
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	NULL	2
+2	NULL	2
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	NULL	2
+3	NULL	2
+
+
+-- !query
+SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS((a, b), (a), ())
+-- !query schema
+struct<a:int,b:int,count(1):bigint>
+-- !query output
+1	1	1
+1	2	1
+1	NULL	2
+1	NULL	2
+2	1	1
+2	2	1
+2	NULL	2
+2	NULL	2
+3	1	1
+3	2	1
+3	NULL	2
+3	NULL	2
+
+
+-- !query
+SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), GROUPING SETS((a, b), (a), ())
+-- !query schema
+struct<a:int,b:int,count(1):bigint>
+-- !query output
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	NULL	2
+1	NULL	2
+1	NULL	2
+1	NULL	2
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	NULL	2
+2	NULL	2
+2	NULL	2
+2	NULL	2
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	NULL	2
+3	NULL	2
+3	NULL	2
+3	NULL	2
+
+
+-- !query
+SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(a, b), GROUPING SETS((a, b), (a), ())
+-- !query schema
+struct<a:int,b:int,count(1):bigint>
+-- !query output
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	1	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	2	1
+1	NULL	2
+1	NULL	2
+1	NULL	2
+1	NULL	2
+1	NULL	2
+1	NULL	2
+1	NULL	2
+1	NULL	2
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	1	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	2	1
+2	NULL	2
+2	NULL	2
+2	NULL	2
+2	NULL	2
+2	NULL	2
+2	NULL	2
+2	NULL	2
+2	NULL	2
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	1	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	2	1
+3	NULL	2
+3	NULL	2
+3	NULL	2
+3	NULL	2
+3	NULL	2
+3	NULL	2
+3	NULL	2
+3	NULL	2