spark-instrumented-optimizer/graphx
Andrew Ray bfdeea5c68 [SPARK-18847][GRAPHX] PageRank gives incorrect results for graphs with sinks
## What changes were proposed in this pull request?

Graphs with sinks (vertices with no outgoing edges) don't have the expected rank sum of n (or 1 for personalized). We fix this by normalizing to the expected sum at the end of each implementation.

Additionally this fixes the dynamic version of personal pagerank which gave incorrect answers that were not detected by existing unit tests.

## How was this patch tested?

Revamped existing and additional unit tests with reference values (and reproduction code) from igraph and NetworkX.

Note that for comparison on personal pagerank we use the arpack algorithm in igraph as prpack (the  current default) redistributes rank to all vertices uniformly instead of just to the personalization source. We could take the alternate convention (redistribute rank to all vertices uniformly) but that would involve more extensive changes to the algorithms (the dynamic version would no longer be able to use Pregel).

Author: Andrew Ray <ray.andrew@gmail.com>

Closes #16483 from aray/pagerank-sink2.
2017-03-17 14:23:07 -07:00
..
src [SPARK-18847][GRAPHX] PageRank gives incorrect results for graphs with sinks 2017-03-17 14:23:07 -07:00
pom.xml [SPARK-17807][CORE] split test-tags into test-JAR 2016-12-21 16:37:20 -08:00