spark-instrumented-optimizer/mllib
zhengruifeng e7fa778dc7 [SPARK-30699][ML][PYSPARK] GMM blockify input vectors
### What changes were proposed in this pull request?
1, add new param blockSize;
2, if blockSize==1, keep original behavior, code path trainOnRows;
3, if blockSize>1, standardize and stack input vectors to blocks (like ALS/MLP), code path trainOnBlocks

### Why are the changes needed?
performance gain on dense dataset HIGGS:
1, save about 45% RAM;
2, 3X faster with openBLAS

### Does this PR introduce any user-facing change?
add a new expert param `blockSize`

### How was this patch tested?
added testsuites

Closes #27473 from zhengruifeng/blockify_gmm.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: zhengruifeng <ruifengz@foxmail.com>
2020-05-12 12:54:03 +08:00
..
benchmarks [SPARK-29297][TESTS] Compare core/mllib module benchmarks in JDK8/11 2019-09-29 21:43:58 -07:00
src [SPARK-30699][ML][PYSPARK] GMM blockify input vectors 2020-05-12 12:54:03 +08:00
pom.xml [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00