Geocoding Python function for PostgreSQL

Gratefully making use of what others have provided, i.e. geopy, Google and plpythonu.

Type to hold result of geocoding:

CREATE TYPE geocoding AS (
  place text,
  latitude DOUBLE PRECISION,
  longitude DOUBLE PRECISION
);

Function that does the actual geocoding (to be extended with more vendors. Hint: look at geopy wiki). Takes an (arbitrary) input string to be geocoded:

CREATE OR REPLACE FUNCTION python_geocode
(
  input text,
  vendor text DEFAULT 'google'
) RETURNS SETOF geocoding AS
$$
  import time
  from geopy import geocoders
  # https://code.google.com/p/geopy/wiki/GettingStarted
 
  time.sleep(0.2)
  # TODO: Add other available vendors, e.g. Yahoo.
  if vendor.lower() == 'google':
    geocoder = geocoders.GoogleV3()
  else:
    raise ValueError("Invalid geocoder: %s" % vendor)
  try:
    for res in geocoder.geocode(input, exactly_one=False):
      yield {'place': res[0], 'latitude': res[1][0], 'longitude': res[1][1]}
  except:
    pass
$$ LANGUAGE plpythonu VOLATILE;

Example:

SELECT place, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)
FROM python_geocode('Kostas');

Things related to Docker

Docker is a cool idea and open-source product, that seems to be taking the tech community by storm. Wired will tell you why it is cool in a story titled The Man Who Built a Computer the Size of the Internet.

The short version goes: Docker is a way to deploy and move applications with dependencies between Linux servers, using a container concept. The idea is similar to how applications are installed on a Mac, i.e. “everything in a single package”.

There are a number of supporting and related technologies, which I will now list:

  • Google Borg/Apache Mesos are a related technologies, and perhaps Borg is the original rolemodel for Docker. Borg is apparently being replaced by a new system codenamed Omega (video). According to a Wired Story, it influenced Twitter to develop Mesos (originally developed by researchers at the University of California at Berkeley), now Apache Mesos, to do a similar thing as Borg. It might be fair to say that Docker is an easy version of Borg/Mesos/Omega, for non-geniuses (people generally hired by Google, Twitter etc).
  • CoreOS is a supporting technology, an OS designed for deploying containers such as Docker. As mentioned in Wired, the project is based on Google’s ChromeOS. According to the website of this operating system, CoreOS is “Linux kernel + systemd. That’s about it.”

This is it for now about Docker. Just heard about it a few hours ago in an email from a friend and supervisor.

Hello GNU profiling

The profiling tool in GNU is called gprof. Here is a short, boring example of how to use it.

1) Write hello world in C (hello.c)

#include <stdio.h>
 
int foo() {
  int b = 54324;
  int j;
  for (j=0; j < 1000000; j++) {
    b = b^j;
  }
  return b;
}
 
int main() {
  int a = 321782;
  int i;
  for(i=0; i<1000; i++) {
    a = a ^ foo();
  }
  printf("Hello foo: %d\n", a);
  return 0;
}
 
}

2) Compile with -pg option

gcc -pg hello.c

3) Run the program to generate profiling information

./a.out # this generates gmon.out file

4) Run gprof on the program and read output in less:

gprof a.out gmon.out | less

Using the Python debugger

A few days ago I found out that using the Python debugger is so easy, I can’t believe I haven’t used it before.

Import the module:

import pdb

Set a breakpoint somewhere in your code:

def some_function(self, x, y, z):
    pdb.set_trace()
    ...

Run your program. Now every time ‘some_function’ is called, the Python interpreter will break. At this point you could:

  • type x to inspect the argument passed to the x parameter
  • hit the ‘n’ button to skip over the next line of code
  • hit the ‘c’ button to resume the program
  • hit the ‘h’ button to get help

Easy enough?

Sequential writes leveldb versus system_x

OK, calling it a benchmark is a bit of an overstatement. It’s taking two different database libraries for a quick spin, and seeing how fast they can write a bunch of integers to disk. A second benchmark checks how fast we can read them.

In this mini-test, I’m running leveldb against a new embedded database library, let’s call it system_x. The purpose is really just so that I can remember some rough numbers regarding these useful database libraries.

I used the time command to gather results, which shows real, user and sys time spent.

Continue reading “Sequential writes leveldb versus system_x”