Category Systems

Twitter HyperLogLog monoids in Spark

Want to count unique elements in a stream without blowing up memory? In more specific words, do you want to use a HyperLogLog counter in Spark? Until today, I’d never heard the word “monoid” before. However, Twitter Algebird is a…

Linked Data: First Blood

Knowing a lot about something, makes me more prone to appraising its value. I unfortunately know very little about Linked data. For this reason, I’ve had a very biased and shamefully low opinion about the concept of linked data. I’ve…

Watched the RAMCloud video

Today I watched a video on RAMCloud. I have made an index over the various sections of the video, with direct links. You’ll find this index in the bottom of this post. “The RAMCloud project is creating a new class…

A stop watch for Postgres

To time the execution of various stages of a long transaction, I’m using the following function: CREATE OR REPLACE FUNCTION CVL_TimerLap() RETURNS double precision AS $$ import time now = time.time() if not SD.has_key(‘t_last’): SD[‘t_last’] = now elapsed = now…

Running LP-solver in Postgres

Having reinstalled PostgreSQL with support for Python and pointing at my non-system python, it is time to test whether I can use the convex optimizer library I’ve installed in my Python 2.7 (pip install cvxopt). Install PL/Python if not already…