spark-instrumented-optimizer/sql/catalyst/src/main
Wenchen Fan 5cb2e33609 [SPARK-14675][SQL] ClassFormatError when use Seq as Aggregator buffer type
## What changes were proposed in this pull request?

After https://github.com/apache/spark/pull/12067, we now use expressions to do the aggregation in `TypedAggregateExpression`. To implement buffer merge, we produce a new buffer deserializer expression by replacing `AttributeReference` with right-side buffer attribute, like other `DeclarativeAggregate`s do, and finally combine the left and right buffer deserializer with `Invoke`.

However, after https://github.com/apache/spark/pull/12338, we will add loop variable to class members when codegen `MapObjects`. If the `Aggregator` buffer type is `Seq`, which is implemented by `MapObjects` expression, we will add the same loop variable to class members twice(by left and right buffer deserializer), which cause the `ClassFormatError`.

This PR fixes this issue by calling `distinct` before declare the class menbers.

## How was this patch tested?

new regression test in `DatasetAggregatorSuite`

Author: Wenchen Fan <wenchen@databricks.com>

Closes #12468 from cloud-fan/bug.
2016-04-19 10:51:58 -07:00
..
antlr4/org/apache/spark/sql/catalyst/parser [SPARK-14398][SQL] Audit non-reserved keyword list in ANTLR4 parser 2016-04-19 09:09:58 +02:00
java/org/apache/spark/sql [SPARK-14426][SQL] Merge PerserUtils and ParseUtils 2016-04-06 10:57:46 -07:00
scala/org/apache/spark/sql [SPARK-14675][SQL] ClassFormatError when use Seq as Aggregator buffer type 2016-04-19 10:51:58 -07:00