If only more people wrote like Lamport

I’m half way through the Part-time parliament on Paxos article by Leslie Lamport. It is an article that describes a three-phase consensus protocol by telling a story of a (fictional) parliement on the greek island of Paxos, complete with fat priests, busy businessmen, and a vivid description of the parliament hall. The story illustrates how the members of the parliament, who would walk in and out at any time, could pass decrees and agree on what had been agreed upon, decree-wise.

Read more

Finding the most quoted main author using linux command line

I have a text file containing article references. It looks like this

- Miller HJ (2004) Tobler’s First Law and spatial analysis. Ann Assoc Am Geogr 94:284–289.
- Onsrud H, ed (2007) Research and Theory in Advanced Spatial Data Infrastructure Concepts (ESRI Press, Redlands, CA).
- Egenhofer M (2002) Toward the geospatial semantic web. Advances in Geographic Information Systems International Symposium, eds Makki Y, Pissinou N (Association for Computing Machinery, McLean, VA), pp 1–4.
- Anselin L, Florax R, Rey S, eds (2004) Advances in Spatial Econometrics: Methodology, Tools and Applications (Springer, Berlin). 
- Wang S, Armstrong M (2009) A theoretical approach to the use of cyberinfrastructure in geographical analysis. Int J Geogr Inf Sci 23:169–193. 
- Wang S (2010) A cyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Ann Assoc Am Geogr 100:535–557.
- Penninga F, Van Oosterom PJM (2008) A simplicial complex-based DBMS approach to 3D topographic data modelling. Int J Geogr Inf Sci 22:751–779. 
- Baker KS, Chandler CL (2008) Enabling long-term oceanographic research: Changing data practices, in- formation management strategies and informatics. Deep-Sea Res II 55(18–19):2132–2142.

I wanted to find out what the most common first author is in that long list of articles, and this is what I did:

cat refs-2009+.txt | \
sed -e '/^ *$/d' -e 's/^- //' | \
cut -d"(" -f1 | \
cut -d, -f1 | \
cut -d' ' -f1 | \
sort | \
uniq -c | \
sort -r > \

The result is this:

   6 Craglia
   4 Wang
   4 Rajabifard
   4 Onsrud
   4 Masser
   4 Grus
   4 Crompvoets
   3 Yang
   3 Steiniger
   3 Gartner
   3 European
   3 Anselin
   2 Wright
   2 Smits
   2 Sieber
   2 Ramsey
   2 Poore
   2 Miller
   2 Lance
   2 Helly
   2 Georgiadou
   2 Fox
   2 Foster
   2 Bregt
   1 Zhang
   1 World

Sticking bicycle paths in CouchDB

In this installment of How To Stick Anything In CouchDB, I’ll take some bicycle path data released by the municipality of Copenhagen and stick them in CouchDB. As always I’m working in Terminal under Mac OS X.

Bicycle paths in Copenhagen, served live by CouchDB:

The data is in shape format, so it’s almost too easy using shp2geocouch. First download the data.

wget http://www.kk.dk/Borger/ByOgTrafik/CyklernesBy/~/media/B12983F7B5E3451F8716F7C072A5B101.ashx
mv B12983F7B5E3451F8716F7C072A5B101.ashx bikepaths.zip
unzip bikepaths.zip

What have we downloaded?

$ ls *.shp
Cykelmidter_2006_r2_ETRS89zone32N.shp	Cykelmidter_2006_r2_WGS84.shp

Create a CouchDB database to hold the data. I’m using my CouchDB installation. You should use yours:

curl -X PUT username:password@gd.iriscouch.com/bikepathscph

Use shp2geocouch to upload the data. It shouldn’t matter which one of the shape files we use, because shp2geocouch does reprojection:

shp2geocouch Cykelmidter_2006_r2_WGS84.shp http://username:password@gd.iriscouch.com/bikepathscph


There is a problem with encoding. If you click on one of the street features on the map above, streets that contain the danish letters [æ,ø,å] are missing those letters. I tried testing a conversion to GeoJSON from the shapefiles with ogr2ogr and it’s the same. God I hate encoding!

By the way, this is how to convert to GeoJSON from Shape using ogr2ogr:

ogr2ogr -f "GeoJSON" cykelmidter.json Cykelmidter_2006_r2_WGS84.shp

If you don’t have shp2geocouch, it’s really easy to install (at least it was on my Mac):

sudo gem install shp2geocouch

There is a “Ruby gem” install home page here, and github here.

How to create JSON data from a text file on the internet

The following assumes a linux command line to be present (or Mac OS X terminal in my case).

I want to wrangle text from the internet, turn it into JSON data, and ultimately stick it in CouchDB. Here I’m trying to turn a random text file containing prime numbers into structured JSON data that looks like this:

[2, 3, 5, 7,...]

The original file is here: http://primes.utm.edu/lists/small/1000.txt. It is fairly structured to begin with, but it’s not JSON.

                         The First 1,000 Primes
                          (the 1,000th is 7919)
         For more information on primes see http://primes.utm.edu/

      2      3      5      7     11     13     17     19     23     29 
     31     37     41     43     47     53     59     61     67     71 
     73     79     83     89     97    101    103    107    109    113 

The following line does turns it into JSON:

curl http://primes.utm.edu/lists/small/1000.txt | \
tail +4 | \
tr -cs "[:digit:]" "," | \
sed -e 's/^,/\[/' -e 's/,$/\]/' \
> primes.json

Let’s look at it with cat to make sure:

$ cat primes.json

Explanation of the command

curl is used to download the file and print it on standard output in the terminal. With no arguments it issues a HTTP GET for http://primes.utm.edu/lists/small/1000.txt.

tail +4 discards the first four lines.

tr -cs "[:digit:]" "," converts the text into digits followed by commas. The new text has a comma before the first digit, and a comma after the last one. No linebreaks or spaces: ,2,3,5,7...,7919,

sed -e 's/^,/\[/' -e 's/,$/\]/' is perhaps a bit hard to read. It replaces the comma before the first digit with '[', and replaces the comma after the last digit with ']'.

Who gains from blocked content on YouTube?

When I want to hear a particular rap song from 1992 on YouTube, the video service shows me this:

Yeah yeah, copyright or whatever, but what is the point? Who gains what exactly? By the way, if you can hear the song in the country you’re in, then fuck you 🙂

Who’s involved, let’s see. Me (the user/customer), Google (the owner of YouTube), the EU (makers of regional copyright laws), Sony (the copyright holder), and CMW (the artist).

Does Google the owner of YouTube win?

No. Google looses straight away, because I can hear the song on GrooveShark just fine (albeit without the video):

Does the EU win?

No. The EU might gain a little bit, because CMW is an american band, so chances are that I’ll listen to a EU artist like Dizzie Rascal instead:

But that’s not going to happen, because I wanted to listen to CMW, and I’ve already found the song on another service, GrooveShark.

Does Sony Music Entertainment win?

No. I already bought the song on iTunes a couple of days ago. If I hadn’t bought it, I would have downloaded it with a torrent. The YouTube video being there or not, did not factor in to my decision to buy the song. I bought the song because it was insanely easy to do on my iPhone. Period. In fact, I might choose to not buy a song in the future if it’s owned by SME.

Does the artist win?

Hardly, in fact they loose. I’m sure they appreciate that I bought the song, though I’m sure Sony Music Entertainment appreciates it a hell of a lot more if I know anything about royalty splits! And the song being blocked on YouTube did not make me buy the song, as I’ve already said. I was about to make CMW more famous, by linking their video on my blog, but couldn’t. Sorry CMW.

Do concerned mothers win?

Does the fictional organization of “concerned mothers against gangster rap” gain anything by a blocked gangster rap tune on the internet? Sure, but that is mere coincidence, it could just as well have been a song about flowers or teddy bears or a praise for “concerned mothers against gangster rap”.

By the way, you may check out the song CMW sampled on “N 2 Deep”. It’s by Lyn Collins, and features the distinct sound of the JB’s. Apparently the copyright holder (Polydor) is not insane:


I find that this blogpost and video on innovation from Edinburg by Ed Parsons is somehow related to this issue.

By the way Ed. If you watch the ping back. Sorry that I stole your look for WordPress. I kinda liked it, and I do listen to gangster rap occasionally so my morals are questionable.

Using shp2geocouch to push OSM data into geocouch

Today I installed the utility shp2geocouch on Mac OS X 1.6.

First I needed to update RubyGems…

sudo gem update --system

Then I could install shp2geocouch

sudo gem install shp2geocouch

Next I downloaded OSM data for Copenhagen, Denmark

wget http://download.cloudmade.com/europe/northern_europe/denmark/copenhagen/copenhagen.shapefiles.zip
unzip copenhagen.shapefiles.zip
cd copenhagen.shapefiles

Finally I used shp2geocouch to upload one of the shape files to iriscouch.com (database gd.iriscouch.com/cphosm).

shp2geocouch europe_northern_europe_denmark_copenhagen_highway.shp gd.iriscouch.com/cphosm

This takes a while and the job is still running on my MacBook Pro (after ~10 minutes 16000 documents have been loaded into iriscouch.com). The final count was 33306 documents.

As a final touch, the script replicates geocouch-utils + map browser and tells me

view your data on a map at http://gd.iriscouch.com/cphosm/_design/geo/_rewrite

The map uses OSM tiles from cloudmade as background, and fetches clickable road data from iriscouch using XHR:

Clicking the link, gives you this: