The spatial databases covered are PostGIS, MySQL spatial and MongoDB, Apache Cassandra.
UPDATE: I'll change this post or create a page to give the actual linux commands to run on the remote server.
PostGIS on EC2
I have found a nice tutorial that describes setting up Postgres on EC2 on an Ubuntu instance with all the trimmings. The blogger (Ron Evans) explains how he does things, including choice of filesystem on EBS, setting up security groups and general architectural decisions. It is quite detailed so you might even learn some linux admin tips from reading it.
I'm using the Amazon Linux AMI for now, and most of what is described should apply for that image as well. I noticed that he installs Postgres with the package manager (apt-get), and Amazon Linux AMI's come with yum.
There is a different tutorial that describes using yum instead of apt-get to install postgres. As a sidenote that writer also seems to prefer the EXT3 filesystem over the XFS filesystem.
There is also a tutorial for installing Postgres 9.0 with yum that includes installation of PostGIS, which is probably the one I'll end up following. There is a separate description for Postgres 8.4.
- An almost idiot's guide to install and upgrade to 8.4 with yum
- An almost idiot's guide to install and upgrade to 9.0 with yum
I recommend following this tutorial up to the point of installing Postgres, and then switching to this tutorial.
MySQL with Spatial Extension on EC2
The procedure for installing MySQL on EC2 is described on the MySQL website. The examples given include one using yum, so that is as easy as it gets.
It should be noted that there are community images on EC2 which come preinstalled with MySQL.
ec2-describe-images -a | grep -i mysql |
The MySQL website also has a very good section for setting up replication for MySQL on EC2 and related subjects.
One aspect that is mentioned is about scalability, and that it is "easier to create more EC2 instances to support more users than to upgrade the instance to a larger machine". Good point I think, and there are more, so I recommend reading that page and many of the hints also apply directly to running Postgres and MongoDB on EC2.
Another tutorial by Sam Starling describes setting MySQL on an Amazon Linux AMI instance, which is the image that I'm using.
Spatial extensions are included in MySQL from version 4.1 and up.
MongoDB on EC2
UPDATE: All posts I've come across on MongoDB and spatial data seem to mention some kind of problem. Either query times are long or there is inacuracy. Perhaps I shoud take a look at Apache Cassandra for spatial data instead..
There is a tutorial for installing MongoDB on an Amazon Linux AMI 64 bit instance using yum, which is exactly what I have.
The MongoDB homepage also has a section specifically for installing MongoDB on EC2. Either way it seems easy enough.
The spatial capabilities of MongoDB are described on the MongoDB homepage, and also here.
I've come across criticism of MongoDB for spatial purposes. I'll look at MongoDB and form my own oppinion but keep this poster in mind if I run into problems. I'd like to understand the algorithms and datastructures used in MongoDB before forming a final oppinion.
Apache Cassandra on EC2
A colleague at the university sent me a link describing using Apache Cassandra for spatial data. An overview of Apache Cassandra articles can be found on the Cassandra website.
It seems that Cassandra can not be installed via a package manager. Installation instructions are given as a quick guide. It requires Java 1.6 update 19 or later, and Amazon Linux AMI's come with Java 1.6 update 20 at present.
wget http://apache.mirrors.webname.dk//cassandra/0.7.6/apache-cassandra-0.7.6-2-bin.tar.gz tar -zxf apache-cassandra-0.7.6-2-bin.tar.gz cd apache-cassandra-0.7.6-2 less README.txt |
General tips
When running databases instances on EC2 use EBS (Elastic Block Storage) to store the data. That way the data is persisted even when the database instance crashes and burns.
Create separate security groups for different tiers like database, web and others.
Do what this page describes with regards to replication etc.
Oh, and running applications with high demands for availability should perhaps be spread out over multiple EC2 regions.