spark-instrumented-optimizer

History

Dilip Biswal 5e1ee28984 [SPARK-15114][SQL] Column name generated by typed aggregate is super verbose ## What changes were proposed in this pull request? Generate a shorter default alias for `AggregateExpression `, In this PR, aggregate function name along with a index is used for generating the alias name. ```SQL val ds = Seq(1, 3, 2, 5).toDS() ds.select(typed.sum((i: Int) => i), typed.avg((i: Int) => i)).show() ``` Output before change. ```SQL +-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ \|typedsumdouble(unresolveddeserializer(upcast(input[0, int], IntegerType, - root class: "scala.Int"), value#1), upcast(value))\|typedaverage(unresolveddeserializer(upcast(input[0, int], IntegerType, - root class: "scala.Int"), value#1), newInstance(class scala.Tuple2))\| +-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ \| 11.0\| 2.75\| +-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ ``` Output after change: ```SQL +-----------------+---------------+ \|typedsumdouble_c1\|typedaverage_c2\| +-----------------+---------------+ \| 11.0\| 2.75\| +-----------------+---------------+ ``` Note: There is one test in ParquetSuites.scala which shows that that the system picked alias name is not usable and is rejected. [test](https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala#L672-#L687) ## How was this patch tested? A new test was added in DataSetAggregatorSuite. Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #13045 from dilipbiswal/spark-15114.	2016-05-21 08:36:08 -07:00
..
src	[SPARK-15114][SQL] Column name generated by typed aggregate is super verbose	2016-05-21 08:36:08 -07:00
pom.xml	[SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into spark-tags	2016-05-17 09:55:53 +01:00

Dilip Biswal 5e1ee28984 [SPARK-15114][SQL] Column name generated by typed aggregate is super verbose

## What changes were proposed in this pull request?

Generate a shorter default alias for `AggregateExpression `, In this PR, aggregate function name along with a index is used for generating the alias name.

```SQL
val ds = Seq(1, 3, 2, 5).toDS()
ds.select(typed.sum((i: Int) => i), typed.avg((i: Int) => i)).show()
```

Output before change.
```SQL
+-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+
|typedsumdouble(unresolveddeserializer(upcast(input[0, int], IntegerType, - root class: "scala.Int"), value#1), upcast(value))|typedaverage(unresolveddeserializer(upcast(input[0, int], IntegerType, - root class: "scala.Int"), value#1), newInstance(class scala.Tuple2))|
+-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                         11.0|                                                                                                                                         2.75|
+-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+
```
Output after change:
```SQL
+-----------------+---------------+
|typedsumdouble_c1|typedaverage_c2|
+-----------------+---------------+
|             11.0|           2.75|
+-----------------+---------------+
```

Note: There is one test in ParquetSuites.scala which shows that that the system picked alias
name is not usable and is rejected.  [test](https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala#L672-#L687)
## How was this patch tested?

A new test was added in DataSetAggregatorSuite.

Author: Dilip Biswal <dbiswal@us.ibm.com>

Closes #13045 from dilipbiswal/spark-15114.

2016-05-21 08:36:08 -07:00

src

[SPARK-15114][SQL] Column name generated by typed aggregate is super verbose

2016-05-21 08:36:08 -07:00

pom.xml

[SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into spark-tags

2016-05-17 09:55:53 +01:00