b69c26833c
### What changes were proposed in this pull request? Optimize some treeAggregates in MLlib by delaying allocating (thus not sending around) large arrays of zeroes This uses the same idea as in https://github.com/apache/spark/pull/23600/files ### Why are the changes needed? Allocating huge arrays of zeroes takes additional memory and network I/O which is unnecessary in some cases. It can cause operations to run out of memory that might otherwise succeed. Specifically, this should prevent the 'zero' value from having to be (pointlessly) checked for serializability, which can fail when passing through the default JavaSerializer; it would also prevent allocating and sending large 'zero' values for an empty partition in the aggregate. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #33443 from srowen/SPARK-35848. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com> |
||
---|---|---|
.. | ||
main | ||
test |