spark-instrumented-optimizer/mllib/src
Sean Owen b69c26833c [SPARK-35848][MLLIB] Optimize some treeAggregates in MLlib by delaying allocations
### What changes were proposed in this pull request?

Optimize some treeAggregates in MLlib by delaying allocating (thus not sending around) large arrays of zeroes
This uses the same idea as in https://github.com/apache/spark/pull/23600/files

### Why are the changes needed?

Allocating huge arrays of zeroes takes additional memory and network I/O which is unnecessary in some cases. It can cause operations to run out of memory that might otherwise succeed. Specifically, this should prevent the 'zero' value from having to be (pointlessly) checked for serializability, which can fail when passing through the default JavaSerializer; it would also prevent allocating and sending large 'zero' values for an empty partition in the aggregate.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #33443 from srowen/SPARK-35848.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-07-22 13:59:09 -05:00
..
main [SPARK-35848][MLLIB] Optimize some treeAggregates in MLlib by delaying allocations 2021-07-22 13:59:09 -05:00
test [SPARK-35310][MLLIB] Update to breeze 1.2 2021-07-22 13:58:01 -05:00