spark-instrumented-optimizer

History

zhengruifeng 5853e8b330 [SPARK-29754][ML] LoR/AFT/LiR/SVC use Summarizer instead of MultivariateOnlineSummarizer ### What changes were proposed in this pull request? 1, change the scope of `ml.SummarizerBuffer` and add a method `createSummarizerBuffer` for it, so it can be used as an aggregator like `MultivariateOnlineSummarizer`; 2, In LoR/AFT/LiR/SVC, use Summarizer instead of MultivariateOnlineSummarizer ### Why are the changes needed? The computation of summary before learning iterations is a bottleneck in high-dimension cases, since `MultivariateOnlineSummarizer` compute much more than needed. In the [ticket](https://issues.apache.org/jira/browse/SPARK-29754) is an example, with `--driver-memory=4G` LoR will always fail on KDDA dataset. If we swith to `ml.Summarizer`, then `--driver-memory=3G` is enough to train a model. ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing testsuites & manual test in REPL Closes #26396 from zhengruifeng/using_SummarizerBuffer. Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by: zhengruifeng <ruifengz@foxmail.com>		2019-11-06 18:19:39 +08:00
..
benchmarks	[SPARK-29297][TESTS] Compare `core`/`mllib` module benchmarks in JDK8/11	2019-09-29 21:43:58 -07:00
src	[SPARK-29754][ML] LoR/AFT/LiR/SVC use Summarizer instead of MultivariateOnlineSummarizer	2019-11-06 18:19:39 +08:00
pom.xml	Revert "Prepare Spark release v3.0.0-preview-rc2"	2019-10-30 17:45:44 -07:00