spark-instrumented-optimizer

History

Wenchen Fan 5cb2e33609 [SPARK-14675][SQL] ClassFormatError when use Seq as Aggregator buffer type ## What changes were proposed in this pull request? After https://github.com/apache/spark/pull/12067, we now use expressions to do the aggregation in `TypedAggregateExpression`. To implement buffer merge, we produce a new buffer deserializer expression by replacing `AttributeReference` with right-side buffer attribute, like other `DeclarativeAggregate`s do, and finally combine the left and right buffer deserializer with `Invoke`. However, after https://github.com/apache/spark/pull/12338, we will add loop variable to class members when codegen `MapObjects`. If the `Aggregator` buffer type is `Seq`, which is implemented by `MapObjects` expression, we will add the same loop variable to class members twice(by left and right buffer deserializer), which cause the `ClassFormatError`. This PR fixes this issue by calling `distinct` before declare the class menbers. ## How was this patch tested? new regression test in `DatasetAggregatorSuite` Author: Wenchen Fan <wenchen@databricks.com> Closes #12468 from cloud-fan/bug.	2016-04-19 10:51:58 -07:00
..
src	[SPARK-14675][SQL] ClassFormatError when use Seq as Aggregator buffer type	2016-04-19 10:51:58 -07:00
pom.xml	[SPARK-14103][SQL] Parse unescaped quotes in CSV data source.	2016-04-08 00:28:59 -07:00

Wenchen Fan 5cb2e33609 [SPARK-14675][SQL] ClassFormatError when use Seq as Aggregator buffer type

## What changes were proposed in this pull request?

After https://github.com/apache/spark/pull/12067, we now use expressions to do the aggregation in `TypedAggregateExpression`. To implement buffer merge, we produce a new buffer deserializer expression by replacing `AttributeReference` with right-side buffer attribute, like other `DeclarativeAggregate`s do, and finally combine the left and right buffer deserializer with `Invoke`.

However, after https://github.com/apache/spark/pull/12338, we will add loop variable to class members when codegen `MapObjects`. If the `Aggregator` buffer type is `Seq`, which is implemented by `MapObjects` expression, we will add the same loop variable to class members twice(by left and right buffer deserializer), which cause the `ClassFormatError`.

This PR fixes this issue by calling `distinct` before declare the class menbers.

## How was this patch tested?

new regression test in `DatasetAggregatorSuite`

Author: Wenchen Fan <wenchen@databricks.com>

Closes #12468 from cloud-fan/bug.

2016-04-19 10:51:58 -07:00

src

[SPARK-14675][SQL] ClassFormatError when use Seq as Aggregator buffer type

2016-04-19 10:51:58 -07:00

pom.xml

[SPARK-14103][SQL] Parse unescaped quotes in CSV data source.

2016-04-08 00:28:59 -07:00