Add PageRank example and data

This commit is contained in:
Ankur Dave 2014-01-12 13:10:53 -08:00
parent f096f4eaf1
commit 5e35d39e0f
4 changed files with 50 additions and 2 deletions

View file

@ -470,10 +470,40 @@ things to worry about.)
# Graph Algorithms
<a name="graph_algorithms"></a>
This section should describe the various algorithms and how they are used.
GraphX includes a set of graph algorithms in to simplify analytics. The algorithms are contained in the `org.apache.spark.graphx.lib` package and can be accessed directly as methods on `Graph` via an implicit conversion to [`Algorithms`][Algorithms]. This section describes the algorithms and how they are used.
[Algorithms]: api/graphx/index.html#org.apache.spark.graphx.lib.Algorithms
## PageRank
PageRank measures the importance of each vertex in a graph, assuming an edge from *u* to *v* represents an endorsement of *v*'s importance by *u*. For example, if a Twitter user is followed by many others, the user will be ranked highly.
Spark includes an example social network dataset that we can run PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of relationships between users is given in `graphx/data/followers.txt`. We can compute the PageRank of each user as follows:
{% highlight scala %}
// Load the implicit conversion to Algorithms
import org.apache.spark.graphx.lib._
// Load the datasets into a graph
val users = sc.textFile("graphx/data/users.txt").map { line =>
val fields = line.split("\\s+")
(fields(0).toLong, fields(1))
}
val followers = sc.textFile("graphx/data/followers.txt").map { line =>
val fields = line.split("\\s+")
Edge(fields(0).toLong, fields(1).toLong, 1)
}
val graph = Graph(users, followers)
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
val ranksByUsername = users.leftOuterJoin(ranks).map {
case (id, (username, rankOpt)) => (username, rankOpt.getOrElse(0.0))
}
// Print the result
println(ranksByUsername.collect().mkString("\n"))
{% endhighlight %}
## Connected Components
## Shortest Path

12
graphx/data/followers.txt Normal file
View file

@ -0,0 +1,12 @@
2 1
3 1
4 1
6 1
3 2
6 2
7 2
6 3
7 3
7 6
6 7
3 7

6
graphx/data/users.txt Normal file
View file

@ -0,0 +1,6 @@
1 BarackObama
2 ericschmidt
3 jeresig
4 justinbieber
6 matei_zaharia
7 odersky

View file

@ -106,7 +106,7 @@ object PageRank extends Logging {
* @tparam ED the original edge attribute (not used)
*
* @param graph the graph on which to compute PageRank
* @param tol the tolerance allowed at convergence (smaller => more * accurate).
* @param tol the tolerance allowed at convergence (smaller => more accurate).
* @param resetProb the random reset probability (alpha)
*
* @return the graph containing with each vertex containing the PageRank and each edge