spark-instrumented-optimizer/mllib
wm624@hotmail.com 036b50347c [SPARK-19110][ML][MLLIB] DistributedLDAModel returns different logPrior for original and loaded model
## What changes were proposed in this pull request?

While adding DistributedLDAModel training summary for SparkR, I found that the logPrior for original and loaded model is different.
For example, in the test("read/write DistributedLDAModel"), I add the test:
val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior
val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior
assert(logPrior === logPrior2)
The test fails:
-4.394180878889078 did not equal -4.294290536919573

The reason is that `graph.vertices.aggregate(0.0)(seqOp, _ + _)` only returns the value of a single vertex instead of the aggregation of all vertices. Therefore, when the loaded model does the aggregation in a different order, it returns different `logPrior`.

Please refer to #16464 for details.
## How was this patch tested?
Add a new unit test for testing logPrior.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16491 from wangmiao1981/ldabug.
2017-01-07 11:07:49 -08:00
..
src [SPARK-19110][ML][MLLIB] DistributedLDAModel returns different logPrior for original and loaded model 2017-01-07 11:07:49 -08:00
pom.xml [SPARK-17807][CORE] split test-tags into test-JAR 2016-12-21 16:37:20 -08:00