Using the Python debugger
A few days ago I found out that using the Python debugger is so easy, I can’t believe I haven’t used it before. Import the module: import pdb Set a breakpoint somewhere in your code: def some_function(self, x, y, z):…
A few days ago I found out that using the Python debugger is so easy, I can’t believe I haven’t used it before. Import the module: import pdb Set a breakpoint somewhere in your code: def some_function(self, x, y, z):…
OK, calling it a benchmark is a bit of an overstatement. It’s taking two different database libraries for a quick spin, and seeing how fast they can write a bunch of integers to disk. A second benchmark checks how fast…
In a project I’m going to use clustering algorithms implemented in Python, such as k-means. Clustering scipy.cluster has been reported to have some problems, so for now I’ll use PyCluster (following advice given on stackoverflow). Install PyCluster: pip install…
Fast compression algorithms like Snappy, QuickLZ and LZ4 are designed for a general stream of bytes, and typically don’t treat byte-sequences representing numbers in any special way. Geospatial data is special in the sense that it often contains a large…
Question: When can a distance function d(x,y) be called metric, pseudo-metric, quasi-metric or semi-metric? Constraint Metric Pseudo Quasi Semi Non-negativity: d(x,y) ≥ 0 x x x x Identity of indiscernibles: d(x,y)=0 ⇒ x=y x x x Symmetry: d(x,y) = d(y,x)…
Rtree is a ctypes Python wrapper of libspatialindex that provides a number of advanced spatial indexing features for the spatially curious Python user.
A friend of mine, who is the CEO of a company that develops an embedded database, asked me to do a presentation on spatial indexing. This was an opportunity for me to brush up on R-trees and similar datastructures. Download…
Spacebase is a spatial datastore that began life as military-grade software, which at least sounds kind of cool. It’s an in-memory database, really, so switch off the cluster and the data is gone. Apparently the same thing was (unknown to…
A distribution algorithm is used to map keys to servers in a distributed key-value store. There are several different ones, implemented in different systems, and with different properties. In this blog post I’ll briefly cover the best-known key hashing schemes,…
Install KC: wget tar xzvf kyotocabinet-1.2.76.tar.gz cd kyotocabinet-1.2.76 ./configure && make && make install # takes a couple of minutes