spark-instrumented-optimizer/sql/core/src/main
Josh Rosen 2816c89b6a [SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic
In `aggregate/utils.scala`, there is a substantial amount of duplication in the expression-rewriting logic. As a prerequisite to supporting imperative aggregate functions in `TungstenAggregate`, this patch refactors this file so that the same expression-rewriting logic is used for both `SortAggregate` and `TungstenAggregate`.

In order to allow both operators to use the same rewriting logic, `TungstenAggregationIterator. generateResultProjection()` has been updated so that it first evaluates all declarative aggregate functions' `evaluateExpression`s and writes the results into a temporary buffer, and then uses this temporary buffer and the grouping expressions to evaluate the final resultExpressions. This matches the logic in SortAggregateIterator, where this two-pass approach is necessary in order to support imperative aggregates. If this change turns out to cause performance regressions, then we can look into re-implementing the single-pass evaluation in a cleaner way as part of a followup patch.

Since the rewriting logic is now shared across both operators, this patch also extracts that logic and places it in `SparkStrategies`. This makes the rewriting logic a bit easier to follow, I think.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9015 from JoshRosen/SPARK-10988.
2015-10-08 14:56:27 -07:00
..
java/org/apache/spark/sql [SPARK-10474] [SQL] Aggregation fails to allocate memory for pointer array (round 2) 2015-09-23 19:34:31 -07:00
resources [SPARK-9763][SQL] Minimize exposure of internal SQL classes. 2015-08-10 13:49:23 -07:00
scala/org/apache/spark/sql [SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic 2015-10-08 14:56:27 -07:00