spark-instrumented-optimizer/sql/core
allisonwang-db 4a8dc5f7a3 [SPARK-36747][SQL] Do not collapse Project with Aggregate when correlated subqueries are present in the project list
### What changes were proposed in this pull request?
This PR adds a check in the optimizer rule `CollapseProject` to avoid combining Project with Aggregate when the project list contains one or more correlated scalar subqueries that reference the output of the aggregate. Combining Project with Aggregate can lead to an invalid plan after correlated subquery rewrite. This is because correlated scalar subqueries' references are used as join conditions, which cannot host aggregate expressions.

For example
```sql
select (select sum(c2) from t where c1 = cast(s as int)) from (select sum(c2) s from t)
```

```
== Optimized Logical Plan ==
Aggregate [sum(c2)#10L AS scalarsubquery(s)#11L] <--- Aggregate has neither grouping nor aggregate expressions.
+- Project [sum(c2)#10L]
   +- Join LeftOuter, (c1#2 = cast(sum(c2#3) as int))  <--- Aggregate expression in join condition
      :- LocalRelation [c2#3]
      +- Aggregate [c1#2], [sum(c2#3) AS sum(c2)#10L, c1#2]
         +- LocalRelation [c1#2, c2#3]

java.lang.UnsupportedOperationException: Cannot generate code for expression: sum(input[0, int, false])
```
Currently, we only allow a correlated scalar subquery in Aggregate if it is also in the grouping expressions.
079a9c5292/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala (L661-L666)

### Why are the changes needed?
To fix an existing optimizer issue.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test.

Closes #33990 from allisonwang-db/spark-36747-collapse-agg.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2021-09-23 12:50:27 +08:00
..
benchmarks [SPARK-34981][SQL][FOLLOWUP] Use SpecificInternalRow in ApplyFunctionExpression 2021-05-24 17:25:24 +09:00
src [SPARK-36747][SQL] Do not collapse Project with Aggregate when correlated subqueries are present in the project list 2021-09-23 12:50:27 +08:00
pom.xml [SPARK-36670][FOLLOWUP][TEST] Remove brotli-codec dependency 2021-09-21 10:57:20 -07:00