Cloud Systems

Twitter HyperLogLog monoids in Spark

Want to count unique elements in a stream without blowing up memory? In more specific words, do you want to use a HyperLogLog counter in Spark? Until today, I’d never heard the word “monoid” before. However, Twitter Algebird is a project that contains a collection of monoids including a HyperLogLog monoid, which can be used […]

Command line Fu Systems

Easiest way to install a PostgreSQL/PostGIS database on Mac

Installing Postgres+PostGIS has never been easier on Mac. In fact, it is now an app! You download the app-file from, place it in your Applications folder, and you’re done. Really. If you think that was over too fast If you think that was over too fast, there is one more thing you can do. […]

Data Systems

Linked Data: First Blood

Knowing a lot about something, makes me more prone to appraising its value. I unfortunately know very little about Linked data. For this reason, I’ve had a very biased and shamefully low opinion about the concept of linked data. I’ve decided to change this. A repository of linked data that I’ve recently taken an interest […]

Programming Spatial stuff Systems

Geocoding Python function for PostgreSQL

Gratefully making use of what others have provided, i.e. geopy, Google and plpythonu. Type to hold result of geocoding: CREATE TYPE geocoding AS ( place text, latitude DOUBLE PRECISION, longitude DOUBLE PRECISION );CREATE TYPE geocoding AS ( place text, latitude double precision, longitude double precision ); Function that does the actual geocoding (to be extended […]

Audio and video Programming Systems

Things related to Docker

Docker is a cool idea and open-source product, that seems to be taking the tech community by storm. Wired will tell you why it is cool in a story titled The Man Who Built a Computer the Size of the Internet. The short version goes: Docker is a way to deploy and move applications with […]

Systems Videos

Watched the RAMCloud video

Today I watched a video on RAMCloud. I have made an index over the various sections of the video, with direct links. You’ll find this index in the bottom of this post. “The RAMCloud project is creating a new class of storage, based entirely in DRAM, that is 2-3 orders of magnitude faster than existing […]

Command line Fu Systems

How many requests per second can I get out of Redis?

Warning: This is not a very interesting post. I’m toying around with the Redis benchmarking tool. What would be significantly more interesting would be to toy around with the Lua API in Redis, which I’ll do in a subsequent post. In this post, I’ll try to squeeze as many get/set requests out of Redis as […]


A stop watch for Postgres

To time the execution of various stages of a long transaction, I’m using the following function: CREATE OR REPLACE FUNCTION CVL_TimerLap() RETURNS double precision AS $$ import time now = time.time() if not SD.has_key(’t_last’): SD[’t_last’] = now elapsed = now – SD[’t_last’] SD[’t_last’] = now return elapsed $$ LANGUAGE plpythonu;CREATE OR REPLACE FUNCTION CVL_TimerLap() RETURNS […]

Algorithms Systems

Running LP-solver in Postgres

Having reinstalled PostgreSQL with support for Python and pointing at my non-system python, it is time to test whether I can use the convex optimizer library I’ve installed in my Python 2.7 (pip install cvxopt). Install PL/Python if not already installed — if not already installed. Doesn’t hurt. CREATE extension plpythonu;– if not already installed. […]

Programming Systems

Some good slides for using PostgreSQL with Python

Peter Eisentraut has written some good slides on coding PostgreSQL clients in Python and on using Python as a stored procedures language in PostgreSQL. First half deals with using Python as a Postgres client. Second half deals with coding stored procedures in Python.