PageRank is a recursive algorithm that ranks nodes(pages) in a directed graph(web). It works on the principal that a page is important if important pages link to it. This definition is circular, so a base case is created, where every page is equally important.

Formula

  • B and C are nodes that point to A.
  • = Damping Constant (0.85 in most cases). It represents that chance that a user will click another link in the page he was currently on, instead of just searching for a new page
  • = The out degree of n i.e. the number of outgoing edges. If the node is a sink node, it is converted into a source node, by adding edges to every node (including itself!)
  • represents the surfer probability, the chance that a random person outside the page finds it

Example

Consider the network shown below, with 4 nodes:

The PageRank formula and values would be as follows:

NodeFormula - PR(Node)
1
2
3
4
Iteration1234
00.250.250.250.25
10.090.300.410.20
20.100.310.310.28
30.100.280.380.24
40.100.300.340.26
100.100.290.360.25

As you can see, node 3 is considered most important, with 2 being second, 4 being 3rd and 1 being last.

Pseudocode

Algorithm PageRank(G):
	d ← 0.85
	For each node n in G do:
		PR(n) ← 1/|V| #PR is be a node attribute, but this style shows recursion better
		If n.outdegree = 0 then:
			For each node n2 in G do:
				Add edge n-n2 to G
			End do
		End if
	End do

	Repeat until convergence: #Convergence can mean 100,1000,etc.
		For each node n in G:
			total ← 0 
				For each incoming node to n:
					total ← total + PR(incoming)/incoming.outdegree
				End do
			PR(n) ← (1-d)/|V| + d*(total)
		End do
	End do

End Algorithm

Implementation

PageRank.py