spark-instrumented-optimizer

History

Joseph K. Bradley 5a3371883a [SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce ## What changes were proposed in this pull request? Previously, `RDD.treeAggregate` used `reduceByKey` and `reduce` in its implementation, neither of which technically allows the `seq`/`combOps` to modify and return their first arguments. This PR uses `foldByKey` and `fold` instead and notes that `aggregate` and `treeAggregate` are semantically identical in the Scala doc. Note that this had some test failures by unknown reasons. This was actually fixed in `e3554605b3`. The root cause was, the `zeroValue` now becomes `AFTAggregator` and it compares `totalCnt` (where the value is actually 0). It starts merging one by one and it keeps returning `this` where `totalCnt` is 0. So, this looks not the bug in the current change. This is now fixed in the commit. So, this should pass the tests. ## How was this patch tested? Test case added in `RDDSuite`. Closes #12217 Author: Joseph K. Bradley <joseph@databricks.com> Author: hyukjinkwon <gurwls223@gmail.com> Closes #18198 from HyukjinKwon/SPARK-14408.	2017-06-09 08:53:18 +01:00
..
src	[SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce	2017-06-09 08:53:18 +01:00
pom.xml	[SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT	2017-04-24 21:48:04 -07:00

Joseph K. Bradley 5a3371883a [SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce

## What changes were proposed in this pull request?

Previously, `RDD.treeAggregate` used `reduceByKey` and `reduce` in its implementation, neither of which technically allows the `seq`/`combOps` to modify and return their first arguments.

This PR uses `foldByKey` and `fold` instead and notes that `aggregate` and `treeAggregate` are semantically identical in the Scala doc.

Note that this had some test failures by unknown reasons. This was actually fixed in e3554605b3.

The root cause was, the `zeroValue` now becomes `AFTAggregator` and it compares `totalCnt` (where the value is actually 0). It starts merging one by one and it keeps returning `this` where `totalCnt` is 0. So, this looks not the bug in the current change.

This is now fixed in the commit. So, this should pass the tests.

## How was this patch tested?

Test case added in `RDDSuite`.

Closes #12217

Author: Joseph K. Bradley <joseph@databricks.com>
Author: hyukjinkwon <gurwls223@gmail.com>

Closes #18198 from HyukjinKwon/SPARK-14408.

2017-06-09 08:53:18 +01:00

src [SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce 2017-06-09 08:53:18 +01:00

pom.xml [SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT 2017-04-24 21:48:04 -07:00