spark-instrumented-optimizer/mllib
Liang-Chi Hsieh 19f882ce1b [SPARK-28933][ML] Reduce unnecessary shuffle in ALS when initializing factors
### What changes were proposed in this pull request?

When Initializing factors in ALS, we should use `mapPartitions` instead of current `map`, so we can preserve existing partition of the RDD of `InBlock`. The RDD of `InBlock` is already partitioned by src block id. We don't change the partition when initializing factors.

### Why are the changes needed?

This patch can reduce unnecessary shuffle after initializing factors.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

It should not change existing tests. It should pass added test that verifies shuffle dependency of factor RDDs.

Closes #25639 from viirya/fix-als-partition.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
2019-09-01 19:49:50 -07:00
..
benchmarks [SPARK-25489][ML][TEST] Refactor UDTSerializationBenchmark 2018-09-23 13:34:06 -07:00
src [SPARK-28933][ML] Reduce unnecessary shuffle in ALS when initializing factors 2019-09-01 19:49:50 -07:00
pom.xml [SPARK-26986][ML] Add JAXB reference impl to build for Java 9+ 2019-02-26 18:26:49 -06:00