spark-instrumented-optimizer/graphx
Enzo Bonnal 402375b59e [SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities
### What changes were proposed in this pull request?

Overload methods `PageRank.runWithOptions` and  `PageRank.runWithOptionsWithPreviousPageRank` (not to break any user-facing signature) with a `normalized` parameter that describes "whether or not to normalize the rank sum".

### Why are the changes needed?

https://issues.apache.org/jira/browse/SPARK-35357

When dealing with a non negligible proportion of sinks in a graph, algorithm based on incremental update of ranks can get a **precision gain for free** if they are allowed to manipulate non normalized ranks.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

By adding a unit test that verifies that (even when dealing with a graph containing a sink) we end up with the same result for both these scenarios:
a)
  - Run **6 iterations** of pagerank in a row using `PageRank.runWithOptions` with **normalization enabled**

b)
  - Run **2 iterations** using `PageRank.runWithOptions` with **normalization disabled**
  - Resume from the `preRankGraph1` and run **2 more iterations** using `PageRank.runWithOptionsWithPreviousPageRank` with **normalization disabled**
  - Finally resume from the `preRankGraph2` and run **2 more iterations** using `PageRank.runWithOptionsWithPreviousPageRank` with **normalization enabled**

Closes #32485 from bonnal-enzo/make-pagerank-normalization-optional.

Authored-by: Enzo Bonnal <enzobonnal@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-05-12 08:56:22 -05:00
..
src [SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities 2021-05-12 08:56:22 -05:00
pom.xml [SPARK-35150][ML] Accelerate fallback BLAS with dev.ludovic.netlib 2021-04-27 14:00:59 -05:00