How to compute the pagerank of almost anything

Whenever two things have a directional relationship to each other, then you can compute the pagerank of those things. For example, you can observe a directional relationships between web pages that link to each other, scientists that cite each other, and chess players that beat each other. The relationship is directional because it matters in what direction the relationship points, e.g. who lost to who in chess.

Intuitively, you may think of directional relationships as a transferal of some abstract value between two parties. For example, when one chess player loses to another in chess, then the value (i.e. relative skill level) of the winner will increase and the value of the loser decrease. Furthermore, the amount of value that is transfered depends on the starting value of each party. For example, if a master chess player loses to a novice chess player in chess, then the relative skill level of the novice will dramatically increase. Conversely, if a novice chess player loses to a master chess player in chess, then that is to be expected. In this situation, the relative skill level of each player should remain roughly the same as before – the status quo.

Below you’ll see an illustration of a small graph with seven nodes and seven edges. The pagerank of each node is illustrated by shading it, where a darker color denotes a higher rank.

If you study this figure, you should notice that:

  • Nodes 1 through 4 all have low rank, because no other nodes point to them
  • Node 5 has a medium rank, because a low-rank node points to it
  • Node 6 has high rank, because many low-rank nodes point to it
  • Node 7 has the highest rank, because a high-rank node points to it, while it points to nothing

Compute pagerank with Python

The pageranks of the nodes in the example graph (see figure above) was computed in Python with the help of the networkx library, which can be installed with pip: pip install networkx. The code that creates a graph and computes pagerank is listed below:

import networkx as nx
# Initialize directed graph
G = nx.DiGraph()
# Add edges (implicitely adds nodes)
# Compute pagerank (keys are node IDs, values are pageranks)
pr = nx.pagerank(G)
  1: 0.06242340798778012, 
  2: 0.06242340798778012, 
  3: 0.06242340798778012, 
  4: 0.06242340798778012, 
  5: 0.08895357136701444, 
  6: 0.32374552689540625, 
  7: 0.33760726978645894

Notice that each nodes is represented by an integer ID, with no specific semantics tied to the nodes nor the edges. In other words, the graph could equally well represent relationships between web pages, scientists and chess players (or something else entirely).

If your relationships can be assigned weights, e.g. the strength of a victory in chess or the prominence of a link on a web page, then you can add weights to the edges in the graph. Luckily, weighted edges can be easily added in networkx:

G.add_edge(1, 2, weight=0.5)

Dealing with time

You may ask yourself, should a chess game that took place last year impact a player’s rank as much as a game that was won or lost just last week? In many situations, the most meaningful answer would be no. A good way to represent the passing of time in a relationship graph is to use edge weights that decrease over time by some function. For example, an exponential decay function can be used, such that relationships that were formed a long time ago have exponentially lower weight than recently formed relationships. This can be achieved in Python with the ** operator with a negative exponent:

time_decayed_weight = max(.00001, time_passed) ** -1
G.add_edge(1, 2, weight=time_decayed_weight)

We use the trick max(.00001, time_passed) to ensure that we do not raise zero to the power of a negative number. The unit of time passed depends on the domain, and is not essential to the computation. For example, the unit could be milliseconds, years or millennia.

To be continued…

Leave a Reply