Trying a Python R-tree implementation

Rtree is a ctypes Python wrapper of libspatialindex that provides a number of advanced spatial indexing features for the spatially curious Python user.

Installation of R-tree for Python

  1. Install libspatialindex:
    • wget http://download.osgeo.org/libspatialindex/spatialindex-src-1.8.0.tar.gz
    • tar xzvf spatialindex-src-1.8.0.tar.gz
    • cd spatialindex-src-1.8.0
    • configure && make && make install
  2. Install R-tree Python wrapper: pip install rtree

Getting started

The library has good documentation and a fine tutorial on toblerity.github.com/rtree/. Actually it is better than most open source project you may stumble upon. The R-tree is very tunable and you can pass an rtree.index.Property object to the index constructor to control a whole lot of things like page sizes and many, many other options (too many to list). The R-tree is designed to be disk resident, so many of the options have to do with this.

One of the nice things is that while you have a ton of configuration options, you can just be lazy and use the defaults, which is incredibly easy.

No point in me simply repeating the tutorial here, so go have a look at it.

Trying some stuff: SpaceBase Python clone

The interface to the R-Tree is not too dissimilar to the SpaceBase API. This makes me wonder how difficult it would be to reimplement the interface in Python. SpaceBase was created for the Israeli Air Force as a high performance spatial cache, and comes with a distribution mechanism through Galaxy. So let's not expect too much :-)

A good starting point is the tutorial section using Rtree as a cheapo spatial database.

Import the R-tree module:

from rtree import index

Next, create an object to store. Here is a GeoJSON feature of the street were I live in Copenhagen:

mystreet = {
    "type": "Feature",
    "geometry": {
        "type":"MultiLineString",
        "coordinates":[[
            [1402528.63585071451962,7491973.552016104571521],
            [1402597.665066956076771,7491840.036925406195223]
        ]]
    },
    "properties": {
        "name": "Anders Henriksens Gade",
        "oneway": "yes"
    },
    "crs": {
        "type":"link",
        "properties": {
            "http://spatialreference.org/ref/sr-org/6864/proj4/",
            "proj4"
        }
    }
}
 
# minimum bounding rectangle for feature
left, bottom, right, top = (
    1402528.63585071, 
    7491840.03692541, 
    1402597.66506696, 
    7491973.5520161)

Create the R-tree index and store the feature in it:

# create an in-memory R-tree index
idx = index.Index()
# disk-based R-tree: 
# idx = index.Index('spatial.db')
 
# Store the geojson object in the (now clustered) index
# In tutorial: idx.insert(id=0, bounds=(left, bottom, right, top), obj=mystreet)
# What actually worked... no keywords args for id and bounds:
idx.insert(0, (left, bottom, right, top), obj=mystreet)

Query the clustered index for feature (using intersection, nearest neighbor):

feature1 = list(idx.intersection((left, bottom, right, top), objects=True))[0].object
feature2 = list(idx.nearest((left, bottom, right, top), objects=True))[0].object

That's the quick and dirty version. If you used the disk-based R-tree, there should now be a file called spatial.db in the current directory. This is very far from being a SpaceBase clone, but it is as far as I could get in the hour I spent on this. I'm convinced however that it would be doable, without an insane amount of effort. Perhaps using the multiprocessing library for the parallelization.

Some useful snippets in to get started. How many cores are available?

import multiprocessing
how_many_cores= multiprocessing.cpu_count()

For the distributed version, we could create an in-memory data grid, akin to Galaxy. There is a post on highscalability.com about in-memory data grids to get started on.

That's all for now. Time to catch some Zs.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.