Sticking bicycle paths in CouchDB

In this installment of How To Stick Anything In CouchDB, I’ll take some bicycle path data released by the municipality of Copenhagen and stick them in CouchDB. As always I’m working in Terminal under Mac OS X.

Bicycle paths in Copenhagen, served live by CouchDB:

The data is in shape format, so it’s almost too easy using shp2geocouch. First download the data.

wget http://www.kk.dk/Borger/ByOgTrafik/CyklernesBy/~/media/B12983F7B5E3451F8716F7C072A5B101.ashx
mv B12983F7B5E3451F8716F7C072A5B101.ashx bikepaths.zip
unzip bikepaths.zip

What have we downloaded?

$ ls *.shp
Cykelmidter_2006_r2_ETRS89zone32N.shp	Cykelmidter_2006_r2_WGS84.shp

Create a CouchDB database to hold the data. I’m using my CouchDB installation. You should use yours:

curl -X PUT username:password@gd.iriscouch.com/bikepathscph

Use shp2geocouch to upload the data. It shouldn’t matter which one of the shape files we use, because shp2geocouch does reprojection:

shp2geocouch Cykelmidter_2006_r2_WGS84.shp http://username:password@gd.iriscouch.com/bikepathscph

Notes

There is a problem with encoding. If you click on one of the street features on the map above, streets that contain the danish letters [æ,ø,å] are missing those letters. I tried testing a conversion to GeoJSON from the shapefiles with ogr2ogr and it’s the same. God I hate encoding!

By the way, this is how to convert to GeoJSON from Shape using ogr2ogr:

ogr2ogr -f "GeoJSON" cykelmidter.json Cykelmidter_2006_r2_WGS84.shp

If you don’t have shp2geocouch, it’s really easy to install (at least it was on my Mac):

sudo gem install shp2geocouch

There is a “Ruby gem” install home page here, and github here.

Using shp2geocouch to push OSM data into geocouch

Today I installed the utility shp2geocouch on Mac OS X 1.6.

First I needed to update RubyGems…

sudo gem update --system

Then I could install shp2geocouch

sudo gem install shp2geocouch

Next I downloaded OSM data for Copenhagen, Denmark

wget http://download.cloudmade.com/europe/northern_europe/denmark/copenhagen/copenhagen.shapefiles.zip
unzip copenhagen.shapefiles.zip
cd copenhagen.shapefiles

Finally I used shp2geocouch to upload one of the shape files to iriscouch.com (database gd.iriscouch.com/cphosm).

shp2geocouch europe_northern_europe_denmark_copenhagen_highway.shp gd.iriscouch.com/cphosm

This takes a while and the job is still running on my MacBook Pro (after ~10 minutes 16000 documents have been loaded into iriscouch.com). The final count was 33306 documents.

As a final touch, the script replicates geocouch-utils + map browser and tells me

view your data on a map at http://gd.iriscouch.com/cphosm/_design/geo/_rewrite

The map uses OSM tiles from cloudmade as background, and fetches clickable road data from iriscouch using XHR:

Clicking the link, gives you this:

Installing spatial databases on EC2

The spatial databases covered are PostGIS, MySQL spatial and MongoDB, Apache Cassandra.

UPDATE: I’ll change this post or create a page to give the actual linux commands to run on the remote server.

PostGIS on EC2

I have found a nice tutorial that describes setting up Postgres on EC2 on an Ubuntu instance with all the trimmings. The blogger (Ron Evans) explains how he does things, including choice of filesystem on EBS, setting up security groups and general architectural decisions. It is quite detailed so you might even learn some linux admin tips from reading it.

I’m using the Amazon Linux AMI for now, and most of what is described should apply for that image as well. I noticed that he installs Postgres with the package manager (apt-get), and Amazon Linux AMI’s come with yum.

There is a different tutorial that describes using yum instead of apt-get to install postgres. As a sidenote that writer also seems to prefer the EXT3 filesystem over the XFS filesystem.

There is also a tutorial for installing Postgres 9.0 with yum that includes installation of PostGIS, which is probably the one I’ll end up following. There is a separate description for Postgres 8.4.

I recommend following this tutorial up to the point of installing Postgres, and then switching to this tutorial.

MySQL with Spatial Extension on EC2

The procedure for installing MySQL on EC2 is described on the MySQL website. The examples given include one using yum, so that is as easy as it gets.

It should be noted that there are community images on EC2 which come preinstalled with MySQL.

ec2-describe-images -a | grep -i mysql

The MySQL website also has a very good section for setting up replication for MySQL on EC2 and related subjects.

One aspect that is mentioned is about scalability, and that it is “easier to create more EC2 instances to support more users than to upgrade the instance to a larger machine”. Good point I think, and there are more, so I recommend reading that page and many of the hints also apply directly to running Postgres and MongoDB on EC2.

Another tutorial by Sam Starling describes setting MySQL on an Amazon Linux AMI instance, which is the image that I’m using.

Spatial extensions are included in MySQL from version 4.1 and up.

MongoDB on EC2

UPDATE: All posts I’ve come across on MongoDB and spatial data seem to mention some kind of problem. Either query times are long or there is inacuracy. Perhaps I shoud take a look at Apache Cassandra for spatial data instead..

There is a tutorial for installing MongoDB on an Amazon Linux AMI 64 bit instance using yum, which is exactly what I have.

The MongoDB homepage also has a section specifically for installing MongoDB on EC2. Either way it seems easy enough.

The spatial capabilities of MongoDB are described on the MongoDB homepage, and also here.

I’ve come across criticism of MongoDB for spatial purposes. I’ll look at MongoDB and form my own oppinion but keep this poster in mind if I run into problems. I’d like to understand the algorithms and datastructures used in MongoDB before forming a final oppinion.

Apache Cassandra on EC2

A colleague at the university sent me a link describing using Apache Cassandra for spatial data. An overview of Apache Cassandra articles can be found on the Cassandra website.

It seems that Cassandra can not be installed via a package manager. Installation instructions are given as a quick guide. It requires Java 1.6 update 19 or later, and Amazon Linux AMI’s come with Java 1.6 update 20 at present.

wget http://apache.mirrors.webname.dk//cassandra/0.7.6/apache-cassandra-0.7.6-2-bin.tar.gz
tar -zxf apache-cassandra-0.7.6-2-bin.tar.gz
cd apache-cassandra-0.7.6-2
less README.txt

General tips

When running databases instances on EC2 use EBS (Elastic Block Storage) to store the data. That way the data is persisted even when the database instance crashes and burns.

Create separate security groups for different tiers like database, web and others.

Do what this page describes with regards to replication etc.

Oh, and running applications with high demands for availability should perhaps be spread out over multiple EC2 regions.

Building osm2pgsql on Mac OS X using homebrew

General instructions are here: http://wiki.openstreetmap.org/wiki/Osm2pgsql#Mac_OS_X

Note: I’m running Snow Leopard (10.6.6 )

1. Install homebrew

Check that you don’t have it already:

$ which brew

If you don’t have homebrew install it from here:

E.g. like this:

$ ruby -e "$(curl -fsSLk https://gist.github.com/raw/323731/install_homebrew.rb)"

2. Install proj

$ brew install proj
$ which proj
/usr/local/bin/proj

3. Install geos

$ brew install geos

4. Install osm2pgsql

First add pg_config to the path, then install osm2pgsql:

$ PATH=$PATH:/Library/PostgreSQL/9.0/bin/
$ brew install osm2pgsql
$ which osm2pgsql
/usr/local/bin/osm2pgsql

You should now have osm2pgsql installed.

Import OSM data into PostgreSQL

I did the following to import OSM data into PostgreSQL.

# create a user for the osm database
createuser -U postgres osm-user
# create the osm database
createdb -U postgres -E utf8 -O osm-user osm
# download som osm data from cloudmade.com, I chose Copenhagen, Denmark.
wget http://downloads.cloudmade.com/europe/northern_europe/denmark/copenhagen/copenhagen.osm.bz2
# unzip it
bzip2 -d copenhagen.osm.bz2
# install the mercator projection 900913 on the database
wget http://svn.openstreetmap.org/applications/utils/export/osm2pgsql/900913.sql
psql -U postgres -d osm -f 900913.sql
# install PostGIS on database
psql -U postgres -d osm -f /Library/PostgreSQL/9.0/share/postgresql/contrib/postgis.sql
# find the style to use with osm2pgsql 
brew list osm2pgsql # list locations installed by homebrew, including location of the default style
 
# Ready to import! use -s if you chose a large OSM dataset, this keeps RAM usage down.
# Use location of style file found with brew list osm2pgsql
osm2pgsql -d osm -U postgres -S /usr/local/Cellar/osm2pgsql/HEAD/share/osm2pgsql/default.style copenhagen.osm

You should now have some OSM data in your PostgreSQL database.

Using SQLite databases on different machines as a single virtual database

You can use separate SQLite database on different machines as a single virtual database by using the attach command in SQLite.

I read about the technique here: http://souptonuts.sourceforge.net/readme_sqlite_tutorial.html

For this example let’s use the simplest possible database. A contact database containing nothing but email addresses.

1. Create SQL script for a simple contacts database

File createcontacts.sql:

CREATE TABLE contacts (
id INTEGER PRIMARY KEY,
email VARCHAR(64));

2. Create two databases

Create the databases:

$ sqlite3 contacts1.db < createcontacts.sql
$ sqlite3 contacts2.db < createcontacts.sql

Notice: I'm using SQLite on Mac OS X, which comes preinstalled with /usr/bin/sqlite3.

Insert one contact in each database:

$ sqlite3 contacts1.db "insert into contacts (email) values ('john@example.com');"
$ sqlite3 contacts2.db "insert into contacts (email) values ('jane@example.com');"

The are now two databases, each with a row of data:

$ ls
contacts1.db	contacts1.sql	contacts2.db	contacts2.sql
$ sqlite3 contacts1.db 'select * from contacts;'
1|john@example.com
$ sqlite3 contacts2.db 'select * from contacts;'
1|jane@example.com

Next, we'll combine the databases using the SQLite command attach.

3. Create a single virtual database

First, enter the sqlite3 console:

$ sqlite3 
SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> 

Then, attach the two databases:

sqlite> attach database 'contacts1.db' as c1;
sqlite> attach database 'contacts2.db' as c2;

Check that it went well with the .databases command:

sqlite> .databases
seq  name             file                                                      
---  ---------------  ----------------------------------------------------------
0    main                                                                       
2    c1               /Users/kostas/Documents/Dev/SqliteExp/attach/contacts1.db 
3    c2               /Users/kostas/Documents/Dev/SqliteExp/attach/contacts2.db 

Finally, select email addresses simultaneously from contacts1.db and contacts2.db:

sqlite> select 'c1',* from c1.contacts union select 'c2',* from c2.contacts;
c1|1|john@example.com
c1|2|janet@example.com
c1|3|joe@example.com
c2|1|jane@example.com

This demonstrates how one can select rows from multiple databases at once, using the attach command in SQLite. Imagine that the databases are actually on separate machines and accessed via e.g. NFS mounts.

Next, I'll look at spatial data in SQLite.

How to import OpenStreetMap data into PostgreSQL

1. Download data and software

The instructions are fairly generic, so should work for both Windows, Linux and Mac OS X. I wrote them for Windows, but I’ve since then switched to Mac OS X.

PostgreSQL+PostGIS

I assume that you do not already have Postgres/PostGIS installed on your system.

Download PostgreSQL+PostGIS for all platforms here: http://www.postgresql.org/download/

Follow the instructions provided to install the database software.

OSM data

Download OpenStreetMap data (.osm file):
downloads.cloudmade.com

I chose europe -> denmark.

osm2pgsql

Download the version for your platform (Windows, Linux, Mac OS X etc):
wiki.openstreetmap.org/wiki/Osm2pgsql

2. Create and prepare database

If you add the PostgreSQL bin folder to your path, you’ll have access to commands like createuser and createdb in your shell.

Create a user (after running command, answer ‘y’ to superuser):

createuser -U postgres <enter your username, I entered kostas>

Create the database (should be called ‘gis’):

createdb -U postgres -E UTF8 -O kostas gis

Install language plpgsql (on my system this was already installed):

createlang -U postgres plpgsql gis

Add PostGIS functionality to database (you should get pages of output as functions etc are created):

psql -U postgres -d gis -f PATH_TO_POSTGRES/share/contrib/postgis-1.5/postgis.sql

Download the file 900913.sql. The following will add spherical mercator projection to the database:

psql -U postgres -d gis -f PATH_TO_FILE/900913.sql

3. Add OSM data to database

Change directory to where you unzipped osm2pgsql:

cd PATH_TO_OSM2PGSQL

Import OSM data:

osm2pgsql -U postgres -s -S ./default.style PATH_TO_OSM_FILE/denmark.osm

Options for osm2pgsql:

  • -U postgres
    Run program with the PostgreSQL admin user
  • -s
    Run in “slim” mode, which means running the program not in RAM, which is necessary on my system
  • -S ./default.style
    On windows (maybe also Linux and other OS) you must specify path to style file. Use default which comes with osm2pgsql.

That’s it! You now have a database full of street data.

4. Testing the database

This is where I live:

select name, astext(way) from planet_osm_line where name='Anders Henriksens Gade'

Which gives the name of the road plus a WKT representation of the geometry:

Anders Henriksens Gade

LINESTRING(1402528.63 7491973.55,1402602.4 7491829.85)

Conclusion

It works, but tables are created in the ‘public’ schema of the ‘gis’ database. This is not so nice. I’d prefer that tables were created e.g. in ‘osm’ schema. When I’ve looked into how this is done, I’ll update this post.

I’d like to write a howto that uses Osmosis to continuously update the local OSM data with diffs from OSM.

Google fusion tables cheat sheet

See below for commands using the Fusion Tables API. Example table is the oldschool message wall public table. Note that examples are shown first without the required url-encoding.

Authenticating: Getting the auth token

To authenticate you may use the following test account myjdoe.

  • account: myjdoe@gmail.com
  • password: JoesSecret

Response:

SID=DQAAAHcAAAD90F7mQKULu7fY44z-maGyaTBElaKsaBvypHqU88b7mqVU93Lyf_1sr7ZbxSosjx2e6dT4LSOswAYG3fzAFVjQ-Z8VBS7-oloLNfNZF3A9qMhhI-cpDxoUe_3SUfdUsTfmk34a4wnkok4im77J-vBM5OqmMxLTE8JaDlylHgG2RA
LSID=DQAAAHoAAADkmXi_iKlUyVI_tuZ7n__6R9PkF3jIior9YnbO5a6lYFtNXxez2Ymw3sXbHKU48IwWhztVfDVUBF4UcCuH2rs3v6s5M1-duCG8gXaJph0oJByr3_Rj9olzm9Zlo5JKgcOiSNN38PY0jnONuuqR2G22KT-meIls05soufg8GEX-Xg
Auth=DQAAAHoAAAA3fDq7sZbeekb_x96Z7icle8VzWyucA50O2HnzUxbG_Y_PfaLfv5HmljUvNJoN1Owrgei796p-LGX-3l1KDRacWF_QhhSpAusdAVgSvxVlqrsJAc52Wu0CFb60_m19AwbbnLjd-CvIy-A-gwBI2Oi0O29lEU0qeL-JMOmiq_UtuQ

Auth is the token.

To make an authenticated POST request use the following header: Authorization: GoogleLogin auth=DQAAAHoAAA... which includes the token.

Query with SELECT

Querying data is done with HTTP GET and the SELECT command. Does not require authentication for public, exportable tables like the oldschool message wall public table.

Select all rows

http://www.google.com/fusiontables/api/query?sql=SELECT * FROM 380584

Try it.

Select rows with WHERE clause

http://www.google.com/fusiontables/api/query?sql=SELECT Message FROM 380584 WHERE User='Kostas'

Try it.

Select rows with spatial clause

http://www.google.com/fusiontables/api/query?sql=SELECT * FROM 380584 WHERE ST_INTERSECTS(Geo, CIRCLE(LATLNG(55.67892,12.58338),5000))

Try it.

Add data with INSERT

Adding rows of data is done with HTTP POST and the INSERT command. Requires authentication.

Notice we are using the token retrieved in the authentication step.

Response:

rowid
401

Get column names with DESCRIBE

Discovering column names is done with HTTP POST and the DESCRIBE command. Requires authentication.

Notice we are using the token retrieved in the authentication step.

Response:

column id,name,type
col0,User,string
col4,Message,string
col5,Geo,location

Client libraries by Google

To help create the API calls, you can use the client libraries developed and shared by Google instead of curl.

Libraries exist for the following languages:

Client libraries
Java gdata-java-client
Javascript gdata-javascript-client
.NET google-gdata
PHP Distributed as part of zend.
Python gdata-python-client
Objective C gdata-objectivec-client