spark-instrumented-optimizer/graphx
Andrew Ray 78062b8521 [SPARK-18845][GRAPHX] PageRank has incorrect initialization value that leads to slow convergence
## What changes were proposed in this pull request?

Change the initial value in all PageRank implementations to be `1.0` instead of `resetProb` (default `0.15`) and use `outerJoinVertices` instead of `joinVertices` so that source vertices get updated in each iteration.

This seems to have been introduced a long time ago in 15a564598f (diff-b2bf3f97dcd2f19d61c921836159cda9L90)

With the exception of graphs with sinks (which currently give incorrect results see SPARK-18847) this gives faster convergence as the sum of ranks is already correct (sum of ranks should be number of vertices).

Convergence comparision benchmark for small graph: http://imgur.com/a/HkkZf
Code for benchmark: https://gist.github.com/aray/a7de1f3801a810f8b1fa00c271a1fefd

## How was this patch tested?

(corrected) existing unit tests and additional test that verifies against result of igraph and NetworkX on a loop with a source.

Author: Andrew Ray <ray.andrew@gmail.com>

Closes #16271 from aray/pagerank-initial-value.
2016-12-15 23:32:10 -08:00
..
src [SPARK-18845][GRAPHX] PageRank has incorrect initialization value that leads to slow convergence 2016-12-15 23:32:10 -08:00
pom.xml [SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT 2016-12-02 21:09:37 -08:00