PageRank is a recursive algorithm that ranks nodes(pages) in a directed graph(web). It works on the principal that a page is important if important pages link to it. This definition is circular, so a base case is created, where every page is equally important.

Formula

PR(A_0) = \frac{1}{|V|}$$ * B and C are nodes that point to A. * $d$ = Damping Constant (0.85 in most cases). It represents that chance that a user will click another link in the page he was currently on, instead of just searching for a new page * $L(n)$ = The out degree of n i.e. the number of outgoing edges. **If the node is a sink node, it is converted into a source node, by adding edges to every node (including itself!)** * $\frac{1-d}{|V|}$ represents the surfer probability, the chance that a random person outside the page finds it ### Example Consider the network shown below, with 4 nodes: ![500](1200px-Directed_graph_no_background.svg.png#BC) The PageRank formula and values would be as follows: | Node | Formula - PR(Node) | | ---- | ------------------------------------------------------------------------------------------ | | 1 | $\frac{1-0.85}{4} + 0.85( \frac{PR(2)}{4})$ | | 2 | $\frac{1-0.85}{4} + 0.85\left( \frac{PR(1)}{2} + \frac{PR(3)}{2} \right)$ | | 3 | $\frac{1-0.85}{4} + 0.85\left( \frac{PR(1)}{2}+ \frac{PR(2)}{4} + \frac{PR(4)}{1}\right)$ | | 4 | $\frac{1-0.85}{4} + 0.85\left( \frac{PR(2)}{4} + \frac{PR(3)}{1} \right)$ | | Iteration | 1 | 2 | 3 | 4 | | --------- | ---- | ---- | ---- | ---- | | 0 | 0.25 | 0.25 | 0.25 | 0.25 | | 1 | 0.09 | 0.30 | 0.41 | 0.20 | | 2 | 0.10 | 0.31 | 0.31 | 0.28 | | 3 | 0.10 | 0.28 | 0.38 | 0.24 | | 4 | 0.10 | 0.30 | 0.34 | 0.26 | | 10 | 0.10 | 0.29 | 0.36 | 0.25 | As you can see, node 3 is considered most important, with 2 being second, 4 being 3rd and 1 being last. ### Pseudocode ``` Algorithm PageRank(G): d ← 0.85 For each node n in G do: PR(n) ← 1/|V| #PR is be a node attribute, but this style shows recursion better If n.outdegree = 0 then: For each node n2 in G do: Add edge n-n2 to G End do End if End do Repeat until convergence: #Convergence can mean 100,1000,etc. For each node n in G: total ← 0 For each incoming node to n: total ← total + PR(incoming)/incoming.outdegree End do PR(n) ← (1-d)/|V| + d*(total) End do End do End Algorithm ``` ### Implementation ![PageRank](PageRank.py)