Here’s a game I like to play. Select two wikipedia pages at random, and find a route from one to the other. I stated a theorem once that:
you can get from any page on wikipedia to the page on pollination in 7 steps or less. (it was actually another page, but let’s say it was pollination)
I devised a method for doing this using Google search. Let’s call the random page s, and the page you want to reach t, e.g. pollination. A given page on wikipedia has a set of incoming links (other pages linked to the page), and a set of outgoing links (other pages linked to by the page). Let’s call these two sets in[p] and out[p]. These two sets contain direct decendants and ancestors of p respectively.
Computing in[p] using google search
link:http://en.wikipedia.org/wiki/Pollination site:en.wikipedia.org (link)
This is trivial. To compute out[http://en.wikipedia.org/wiki/Ketchup] simply visit the page on Ketchup and look at the links.
The idea is to use BFS (breadth first search) simultaneously from s and t. Searching from s involves recursively computing out[p] to discover the set of decendants of s. Searching from t involves recursively computing in[p] to discover the set of ancestors of t. If there is a route from s to t, at some point the intersection between the decendants of s and the ancestors of t will be non-empty. This situation is illustrated by the green node:
Now, Google is a bit stingy with the requests, and do not allow requests made e.g. by curl. So for now the algorithm is run by hand.
Optimization using guided depth-first-search
One optimization when computing in[p], is to include a word that is very general or popular, e.g. the word “sex”. The idea is that the likelyhood that a given page p is both ancestor of t and a decendant of s, increases with the popularity of p. In this case the algorithm uses depth first to first explore paths going through popular pages.
As an example, the page on sex is a member of in[http://en.wikipedia.org/wiki/Pollination]. One of the reason that I chose Pollination to begin with…