spark-instrumented-optimizer

History

Josh Rosen 2816c89b6a [SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic In `aggregate/utils.scala`, there is a substantial amount of duplication in the expression-rewriting logic. As a prerequisite to supporting imperative aggregate functions in `TungstenAggregate`, this patch refactors this file so that the same expression-rewriting logic is used for both `SortAggregate` and `TungstenAggregate`. In order to allow both operators to use the same rewriting logic, `TungstenAggregationIterator. generateResultProjection()` has been updated so that it first evaluates all declarative aggregate functions' `evaluateExpression`s and writes the results into a temporary buffer, and then uses this temporary buffer and the grouping expressions to evaluate the final resultExpressions. This matches the logic in SortAggregateIterator, where this two-pass approach is necessary in order to support imperative aggregates. If this change turns out to cause performance regressions, then we can look into re-implementing the single-pass evaluation in a cleaner way as part of a followup patch. Since the rewriting logic is now shared across both operators, this patch also extracts that logic and places it in `SparkStrategies`. This makes the rewriting logic a bit easier to follow, I think. Author: Josh Rosen <joshrosen@databricks.com> Closes #9015 from JoshRosen/SPARK-10988.		2015-10-08 14:56:27 -07:00
..
java/org/apache/spark/sql	[SPARK-10474] [SQL] Aggregation fails to allocate memory for pointer array (round 2)	2015-09-23 19:34:31 -07:00
resources	[SPARK-9763][SQL] Minimize exposure of internal SQL classes.	2015-08-10 13:49:23 -07:00
scala/org/apache/spark/sql	[SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic	2015-10-08 14:56:27 -07:00