5853e8b330
### What changes were proposed in this pull request? 1, change the scope of `ml.SummarizerBuffer` and add a method `createSummarizerBuffer` for it, so it can be used as an aggregator like `MultivariateOnlineSummarizer`; 2, In LoR/AFT/LiR/SVC, use Summarizer instead of MultivariateOnlineSummarizer ### Why are the changes needed? The computation of summary before learning iterations is a bottleneck in high-dimension cases, since `MultivariateOnlineSummarizer` compute much more than needed. In the [ticket](https://issues.apache.org/jira/browse/SPARK-29754) is an example, with `--driver-memory=4G` LoR will always fail on KDDA dataset. If we swith to `ml.Summarizer`, then `--driver-memory=3G` is enough to train a model. ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing testsuites & manual test in REPL Closes #26396 from zhengruifeng/using_SummarizerBuffer. Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by: zhengruifeng <ruifengz@foxmail.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |