Building osm2pgsql on Mac OS X using homebrew

General instructions are here:

Note: I’m running Snow Leopard (10.6.6 )

1. Install homebrew

Check that you don’t have it already:

$ which brew

If you don’t have homebrew install it from here:

E.g. like this:

$ ruby -e "$(curl -fsSLk"

2. Install proj

$ brew install proj
$ which proj

3. Install geos

$ brew install geos

4. Install osm2pgsql

First add pg_config to the path, then install osm2pgsql:

$ PATH=$PATH:/Library/PostgreSQL/9.0/bin/
$ brew install osm2pgsql
$ which osm2pgsql

You should now have osm2pgsql installed.

Import OSM data into PostgreSQL

I did the following to import OSM data into PostgreSQL.

# create a user for the osm database
createuser -U postgres osm-user
# create the osm database
createdb -U postgres -E utf8 -O osm-user osm
# download som osm data from, I chose Copenhagen, Denmark.
# unzip it
bzip2 -d copenhagen.osm.bz2
# install the mercator projection 900913 on the database
psql -U postgres -d osm -f 900913.sql
# install PostGIS on database
psql -U postgres -d osm -f /Library/PostgreSQL/9.0/share/postgresql/contrib/postgis.sql
# find the style to use with osm2pgsql 
brew list osm2pgsql # list locations installed by homebrew, including location of the default style
# Ready to import! use -s if you chose a large OSM dataset, this keeps RAM usage down.
# Use location of style file found with brew list osm2pgsql
osm2pgsql -d osm -U postgres -S /usr/local/Cellar/osm2pgsql/HEAD/share/osm2pgsql/ copenhagen.osm

You should now have some OSM data in your PostgreSQL database.

Using SQLite databases on different machines as a single virtual database

You can use separate SQLite database on different machines as a single virtual database by using the attach command in SQLite.

I read about the technique here:

For this example let’s use the simplest possible database. A contact database containing nothing but email addresses.

1. Create SQL script for a simple contacts database

File createcontacts.sql:

CREATE TABLE contacts (
email VARCHAR(64));

2. Create two databases

Create the databases:

$ sqlite3 contacts1.db < createcontacts.sql
$ sqlite3 contacts2.db < createcontacts.sql

Notice: I'm using SQLite on Mac OS X, which comes preinstalled with /usr/bin/sqlite3.

Insert one contact in each database:

$ sqlite3 contacts1.db "insert into contacts (email) values ('');"
$ sqlite3 contacts2.db "insert into contacts (email) values ('');"

The are now two databases, each with a row of data:

$ ls
contacts1.db	contacts1.sql	contacts2.db	contacts2.sql
$ sqlite3 contacts1.db 'select * from contacts;'
$ sqlite3 contacts2.db 'select * from contacts;'

Next, we'll combine the databases using the SQLite command attach.

3. Create a single virtual database

First, enter the sqlite3 console:

$ sqlite3 
SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"

Then, attach the two databases:

sqlite> attach database 'contacts1.db' as c1;
sqlite> attach database 'contacts2.db' as c2;

Check that it went well with the .databases command:

sqlite> .databases
seq  name             file                                                      
---  ---------------  ----------------------------------------------------------
0    main                                                                       
2    c1               /Users/kostas/Documents/Dev/SqliteExp/attach/contacts1.db 
3    c2               /Users/kostas/Documents/Dev/SqliteExp/attach/contacts2.db 

Finally, select email addresses simultaneously from contacts1.db and contacts2.db:

sqlite> select 'c1',* from c1.contacts union select 'c2',* from c2.contacts;

This demonstrates how one can select rows from multiple databases at once, using the attach command in SQLite. Imagine that the databases are actually on separate machines and accessed via e.g. NFS mounts.

Next, I'll look at spatial data in SQLite.

How to import OpenStreetMap data into PostgreSQL

1. Download data and software

The instructions are fairly generic, so should work for both Windows, Linux and Mac OS X. I wrote them for Windows, but I’ve since then switched to Mac OS X.


I assume that you do not already have Postgres/PostGIS installed on your system.

Download PostgreSQL+PostGIS for all platforms here:

Follow the instructions provided to install the database software.

OSM data

Download OpenStreetMap data (.osm file):

I chose europe -> denmark.


Download the version for your platform (Windows, Linux, Mac OS X etc):

2. Create and prepare database

If you add the PostgreSQL bin folder to your path, you’ll have access to commands like createuser and createdb in your shell.

Create a user (after running command, answer ‘y’ to superuser):

createuser -U postgres <enter your username, I entered kostas>

Create the database (should be called ‘gis’):

createdb -U postgres -E UTF8 -O kostas gis

Install language plpgsql (on my system this was already installed):

createlang -U postgres plpgsql gis

Add PostGIS functionality to database (you should get pages of output as functions etc are created):

psql -U postgres -d gis -f PATH_TO_POSTGRES/share/contrib/postgis-1.5/postgis.sql

Download the file 900913.sql. The following will add spherical mercator projection to the database:

psql -U postgres -d gis -f PATH_TO_FILE/900913.sql

3. Add OSM data to database

Change directory to where you unzipped osm2pgsql:


Import OSM data:

osm2pgsql -U postgres -s -S ./ PATH_TO_OSM_FILE/denmark.osm

Options for osm2pgsql:

  • -U postgres
    Run program with the PostgreSQL admin user
  • -s
    Run in “slim” mode, which means running the program not in RAM, which is necessary on my system
  • -S ./
    On windows (maybe also Linux and other OS) you must specify path to style file. Use default which comes with osm2pgsql.

That’s it! You now have a database full of street data.

4. Testing the database

This is where I live:

select name, astext(way) from planet_osm_line where name='Anders Henriksens Gade'

Which gives the name of the road plus a WKT representation of the geometry:

Anders Henriksens Gade

LINESTRING(1402528.63 7491973.55,1402602.4 7491829.85)


It works, but tables are created in the ‘public’ schema of the ‘gis’ database. This is not so nice. I’d prefer that tables were created e.g. in ‘osm’ schema. When I’ve looked into how this is done, I’ll update this post.

I’d like to write a howto that uses Osmosis to continuously update the local OSM data with diffs from OSM.


Update: The previous version of this howto was a bit unclear or even erroneous, and some people had problems getting it to work. I have now rewritten it and testet it with SOLR 3.5.0 and jQuery 1.7.1.

Making SOLR return results suitable for consumption using JSONP is very easy!

Calling SOLR in JSONP style

Enter the following into your browser to see an example of how this works (change values for ‘’ and ‘yoursearchgoeshere’):


Looking at the structure of the JSON will give you an idea about how to process it in the following code.

The important two parameters in the URL are wt=json and json.wrf=callback. Of course callback can be anything (it’s just the name of the callback to call), so json.wrf=foo works as well. jQuery will autogenerate a name for you, so you don’t need to worry about it.

To perform the above call with jQuery in a JSONP style, use the following code snippet, and process the data result in the success function.

  'url': '',
  'data': {'wt':'json', 'q':'your search goes here'},
  'success': function(data) { /* process e.g. */ },
  'dataType': 'jsonp',
  'jsonp': 'json.wrf'

Google fusion tables cheat sheet

See below for commands using the Fusion Tables API. Example table is the oldschool message wall public table. Note that examples are shown first without the required url-encoding.

Authenticating: Getting the auth token

To authenticate you may use the following test account myjdoe.

  • account:
  • password: JoesSecret



Auth is the token.

To make an authenticated POST request use the following header: Authorization: GoogleLogin auth=DQAAAHoAAA... which includes the token.

Query with SELECT

Querying data is done with HTTP GET and the SELECT command. Does not require authentication for public, exportable tables like the oldschool message wall public table.

Select all rows * FROM 380584

Try it.

Select rows with WHERE clause Message FROM 380584 WHERE User='Kostas'

Try it.

Select rows with spatial clause * FROM 380584 WHERE ST_INTERSECTS(Geo, CIRCLE(LATLNG(55.67892,12.58338),5000))

Try it.

Add data with INSERT

Adding rows of data is done with HTTP POST and the INSERT command. Requires authentication.

Notice we are using the token retrieved in the authentication step.



Get column names with DESCRIBE

Discovering column names is done with HTTP POST and the DESCRIBE command. Requires authentication.

Notice we are using the token retrieved in the authentication step.


column id,name,type

Client libraries by Google

To help create the API calls, you can use the client libraries developed and shared by Google instead of curl.

Libraries exist for the following languages:

Client libraries
Java gdata-java-client
Javascript gdata-javascript-client
.NET google-gdata
PHP Distributed as part of zend.
Python gdata-python-client
Objective C gdata-objectivec-client

How to load Javascript dependencies dynamically

Loading jQuery using plain Javascript::

// inject e.g. jQuery into a webpage
var thescript =  '';
var newscript = document.createElement( 'script' );
newscript.setAttribute( 'src', thescript );
newscript.setAttribute( 'type', 'text/javascript' );
var head = document.getElementsByTagName("head")[0];

To call this from your script with a test to see if jQuery is loaded:

// Test if jQuery is loaded 
if( typeof(jQuery) == 'undefined') 
  // code from previous listing

Turning big hard problems into smaller, less hard problems.

Here I have captured a thought process I had while reading about algorithms for hard graph problems. The thoughts are inspired by MapReduce, distributed merge sort and the more colorful newspapers of the world.

Summary of thoughts

Given an instance of an problem (think Max Clique, Traveling Salesman or another hard graph problem)…

Thought 1:

Compute an  instance that is “easier” but has the same optimal solution. This is done by a “reducer algorithm”.

Thought 2:

Reducer algorithms may run in parallel.

Thought 3:

Reducer algorithms may be different.

Thought 4:

Reducer algorithms can “gossip” with each other during execution. Gossip helps an algorithm by yielding new information about the problem being solved.

Thought 5:

Gossip is either a suboptimal solution or a reduced problem instance. This information can be used as a lower bound, or in other ways.

Thought 6:

“Merger algorithms” can combine problem instances from different reducer algorithms into one.

A full example of reducing and merging: Maximum Clique Problem.

Here is an instance of the Maximum Clique Problem, in this case a non-planar graph. By the way, planar graphs are boring because they can only contain cliques of size 4 or smaller.

A graph that contains cliques of different sizes.

Let’s see what could happen when running two different reducers (reducer 1 and reducer 2) on this problem instance, and then merging the returned instances.

Reducer 1 works by randomly finding a clique in the graph, and repeatedly deleting nodes that have degree less than the size of the clique. The clique found is emitted as a gossip message (reducer 2 will use this as a lower bound).

Here is the result of running reducer 1:

Reduced instance
The red nodes is the clique found by reducer 1 and gossiped to reducer 2

Let’s look at reducer 2. While running reducer 2 could receive a gossip message from reducer 1, that a clique of size 4 has been found. Reducer 2 could use this as a lower bound. Reducer 2 targets nodes of degree around the lower bound. It works (slowly) be examining the targeted node to find out if it is part of a clique. If not it is deleted from the graph.

This could be the result of running reducer 2 (and accepting gossip from reducer 1):

Reducer 2 result
After getting a gossip that a 4-clique has been found, reducer 2 targets nodes with degree 4 and removes them if they are not in a clique.

In this madeup example reducer 1 managed to remove more nodes than reducer 2, but the point is that they removed different nodes.

Running the merger (computes the intersection) on the two reduces instances yields this:

Merged graphs
The result of merging the output of reducer 1 and reducer 2

Yay, an even smaller instance. But while we have the reducers up and running, we not restart reducer 1 with this instance as input! Let’s see what we get.

reducer 1 rerun
Feeding the reduced instance into reducer 1 for further reduction eliminates even more nodes

This look pretty good. This graph contains only 23 nodes, which is approximately half of the original graph, and that by discovering a relatively small clique of size 4 (compared to the big one of size 7).

Conclusion and a small disclaimer

Most people who deal with such problems call this sort of thing preprocessing. I call it a “reducer network”, mainly because it sounds cooler, but also because I think there might be a novel idea here. Namely running a host of algorithms in a distributed environment to perform the preprocessing while emitting and accepting gossip. Of course this is very similar to the ideas behind Google MapReduce and similar services, and might be exactly the same thing. I just felt the need to document my though process, and this post was created 🙂

This blog post is based on ideas and thoughts I had while reading “The Algorithm Design Manual” by Skiena (great book). The thougts are just that, thoughts.