spark-instrumented-optimizer/mllib
sethah 93ae176e89 [SPARK-19745][ML] SVCAggregator captures coefficients in its closure
## What changes were proposed in this pull request?

JIRA: [SPARK-19745](https://issues.apache.org/jira/browse/SPARK-19745)

Reorganize SVCAggregator to avoid serializing coefficients. This patch also makes the gradient array a `lazy val` which will avoid materializing a large array on the driver before shipping the class to the executors. This improvement stems from https://github.com/apache/spark/pull/16037. Actually, probably all ML aggregators can benefit from this.

We can either: a.) separate the gradient improvement into another patch b.) keep what's here _plus_ add the lazy evaluation to all other aggregators in this patch or c.) keep it as is.

## How was this patch tested?

This is an interesting question! I don't know of a reasonable way to test this right now. Ideally, we could perform an optimization and look at the shuffle write data for each task, and we could compare the size to what it we know it should be: `numCoefficients * 8 bytes`. Not sure if there is a good way to do that right now? We could discuss this here or in another JIRA, but I suspect it would be a significant undertaking.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #17076 from sethah/svc_agg.
2017-03-02 19:38:25 -08:00
..
src [SPARK-19745][ML] SVCAggregator captures coefficients in its closure 2017-03-02 19:38:25 -08:00
pom.xml [SPARK-17807][CORE] split test-tags into test-JAR 2016-12-21 16:37:20 -08:00