Having a look at Spacebase

Spacebase is a spatial datastore that began life as military-grade software, which at least sounds kind of cool. It’s an in-memory database, really, so switch off the cluster and the data is gone. Apparently the same thing was (unknown to the SpaceBase people?) invented in the 90’s by some americans also having the military as their first customer.

100% in-memory: (SpaceBase) does not persist data to disk, so it cannot be used for long-term storage. Nor is it meant to be used for analytics: It’s OLTP; not OLAP. If you do require long-term storage, you can use SpaceBase as a temporary cache… occasionally query the data in SpaceBase and flush it to a slower database. – SpaceBase Blog

There are really two obvious ways to use SpaceBase. Either as an embedded database, which offers the opportunity to exploit the parallelization features. Or as a kind of spatial Redis, i.e. a network cache server (well almost, because Redis does persist to disk).

If it had optional persistence, even an asynchronous kind like Redis, it would be a killer spatial database. It does all of the bread-and-butter spatial queries that PostGIS supports, i.e. spatial joins, select objects that intersect bounding box etc. I don’t see anything like a relational algebra, but that is because SpaceBase is not a relational database. Custom queries are full table scans, which can be terminated early by relating the bounding box of the query to bounding boxes of internal nodes in the R-tree that store the data. The visitor pattern is so general that I don’t see why SpaceBase would be any less powerful than PostGIS, query-wise.

SpaceBase also supports a multi-server setup using an open source, in-memory data grid called Galaxy for “limitless scaling”, which I take to be linear scalability of reads and writes (but still no durability?).

What is a data grid? A Data Grid is an architecture or set of services that enable individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes [1], [2].

A good explanation of what Galaxy is, can be found in an article on highscalability.com.

Finally, SpaceBase is implemented in Java.

Getting more detailed information

The most detailed information about SpaceBase can be found in the SpaceBase blog, which at the moment contains 7 post, most of which are about Galaxy. A fairly good understanding of the features of SpaceBase can obtained by reading SpaceBase Java User Guide, for example how custom queries are written, what queries can be made out-of-the-box, but not how the distributed version works.

Getting more detailed information about SpaceBase itself is tricky, as the database is not open source. One can do some guessing of course 🙂

Current features

Quick feature overview of Spacebase (more or less copied from the feature section):

  • atomic multi-object transactions
  • Stores 2D and 3D spatial objects with or without extents
  • Promises fast insert,update and delete operations
  • Pre-built (e.g. bbox and intersects) but also “user-made” queries (these sound interesting)
  • Promises “fast join queries” which sound like fast self-joins in the description (again, not sure what the data model is).
  • Spatial processing parallelization. This sounds particularly interesting
  • Load balancing (which should be pretty standard for a distributed database)
  • Optimized for high write rates

The product page states that “SpaceBase does not particularly excel at storing a high-volume of static objects”, probably because data has to fit in main memory. I do wonder if there is somehow to persist to disk?

Features I’d like

I’d really like support for either a hybrid in-memory/disc model or at least the option to generate consistent snapshots, which can be persisted to disc. The folks at SpaceBase suggest querying the entire database and dumping the result, but is this consistent? This is of course a general question I have about queries. I’m sure the answer is in the documentation somewhere, just have to dig deeper.

Under the covers

Like I said, there is not a lot of information about the SpaceBase implementation or for instance the algorithms used. The product info page mentions that objects in Spacebase are indexed using an R-Tree (distributed?), which is not surprising given that objects are indexed by their bounding box.

Most of the information in the blog seems to be about Galaxy, the data grid used to build a distributed SpaceBase system.

Being a phd student, I’m most interested in how SpaceBase works, so I hope I can dig up something more concrete than what the API alludes to!

Leave a Reply