Writing a parser in Python

This is my base pattern for writing a parser in Python by using the pyparsing library. It is slightly more complicated than a hello world in pyparsing, but I think it is more useful as a small example of writing a parser for a real grammar.

A base class PNode is used to provide utility functions to classes implementing parse tree nodes, e.g. turning a parse tree into the original string (except all whitespace is replaced by single space). It assumes that tokens in the input where separated by whitespace, and that all whitespace is the same.

For a particular grammar, I use Python classes to represent nodes in the parse tree; these classes get created by calling the setParseAction method on the corresponding BNF element. I like having these classes because it adds a nice structure to the parse tree.

from pyparsing import *
 
class PNode(object):
    """Base class for parser elements"""
    def __init__(self, tokens):
        super(PNode, self).__init__()
        self.tokens = tokens
 
    def __str__(self):
        return u" ".join(map(lambda x: unicode(x), self.tokens))
 
    def __repr__(self):
        return self.__str__()
 
# Target classes
 
class Integer(PNode):
    def __init__(self, tokens):
        super(Integer, self).__init__(tokens)
        self.value = int(tokens[0])
 
class Comma(PNode):
    def __init__(self, tokens):
        super(Comma, self).__init__(tokens)
 
class IntegerList(PNode):
    def __init__(self, tokens):
        super(IntegerList, self).__init__(tokens)
        self.integers = filter(lambda x: type(x) == Integer, tokens)
        #pdb.set_trace()
        #self.foo = 'bar'
 
# BNF
 
comma = Literal(',').setParseAction(Comma)
integer = Word(nums).setParseAction(Integer)
integer_list = (integer + ZeroOrMore(comma + integer)).setParseAction(IntegerList)
 
bnf = integer_list
bnf += StringEnd()
 
# Try parser
 
parsed_list = bnf.parseString('1,2,3')[0]
 
print parsed_list

When to be most careful about catching the flu?

Continuing on my blogification of Peter Norvigs excellent talk, the question is, when to watch out for the flu, e.g. if you live in Denmark?

1) Go to www.google.com/trends/
2) Type in the word “influenza”
3) Select your geographical region (Denmark in my case)
4) See data up to year 2008, to avoid the graph being squished by the outbreak of A(H1N1) (which leads to unusually many people talking about the flu)

Turns out the answer is: watch out in October and February.

How long is a year?

How to find out how long a year is on Earth by only analyzing text? This approach is lifted straight from an excellent and very inspiring talk by Peter Norvig.

1) Go to www.google.com/trends/
2) Type in the word “Icecream”
3) Measure the distance between the peaks (turns out that the average is exactly the length of a year)

Geocoding Python function for PostgreSQL

Gratefully making use of what others have provided, i.e. geopy, Google and plpythonu.

Type to hold result of geocoding:

CREATE TYPE geocoding AS (
  place text,
  latitude DOUBLE PRECISION,
  longitude DOUBLE PRECISION
);

Function that does the actual geocoding (to be extended with more vendors. Hint: look at geopy wiki). Takes an (arbitrary) input string to be geocoded:

CREATE OR REPLACE FUNCTION python_geocode
(
  input text,
  vendor text DEFAULT 'google'
) RETURNS SETOF geocoding AS
$$
  import time
  from geopy import geocoders
  # https://code.google.com/p/geopy/wiki/GettingStarted
 
  time.sleep(0.2)
  # TODO: Add other available vendors, e.g. Yahoo.
  if vendor.lower() == 'google':
    geocoder = geocoders.GoogleV3()
  else:
    raise ValueError("Invalid geocoder: %s" % vendor)
  try:
    for res in geocoder.geocode(input, exactly_one=False):
      yield {'place': res[0], 'latitude': res[1][0], 'longitude': res[1][1]}
  except:
    pass
$$ LANGUAGE plpythonu VOLATILE;

Example:

SELECT place, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)
FROM python_geocode('Kostas');

Playing with GraphViz and MathGenealogy data

Math in Genealogy is a great project (donate online). Sven Köhler from Potsdam, Germany has written a python script for visualizing the database, which I’m going to try.

First step is to clone the git repo:

$ git clone git@github.com:tzwenn/MathGenealogy.git

His instructions are quite simple:

$ ./genealogy.py --search 38586  # 30 seconds
$ ./genealogy.py --display 38586 > euler.dot  # 0.1 seconds

Next step is to install e.g. GraphViz, which is needed to visualize the dot file as a graph. Go to the download page for GraphViz, and follow instructions for your OS.

This should install the commandline tool also. Now you can visualize Leonard Euler’s supervisor family tree (direct descendants) like this:

$ dot euler.dot -Tpng -o euler.png

Looking at the database is easy. Every invocation of ./genealogy.py –search writes to a sqlite3 database file (genealogy.db).

$ sqlite3 genealogy.db

This opens up a prompt. Have a look at the schema of the database like this:

sqlite> .schema

And see what is inside the thesis table like this:

sqlite> select * from thesis;

Gregory Palamas 1363

This stuff blows my mind.

Gregory Palamas (1296–135), spelled Γρηγόριος Παλαμάς in greek, was a monk on Mount Athos, a place I’ve visited with my father two times. It is a beautiful peninsula in northern Greece, scattered with old monasteries, the entire half-island being the sole domain of men.

Simonopetra, Mount Athos.
Simonopetra, Mount Athos.

Palamas eventually became the Archbishop of Thessaloniki, which is a city I incidentally happened to live in from 1995-1996. Below is a picture of Gregory Palamas, in the form of an icon.

Gregorio Palamas
Gregorio Palamas

In his early youth, My father (Georgios Kefaloukos) was also a monk on Mount Athos. There he learned the art of icon painting, and could have painted one of Palamas, although I don’t think he did. Below is a picture of my father taken on Mount Athos.

My Father, Georgios Kefaloukos on Mount Athos.
My Father on Mount Athos in 1966

When I first heard about the Math in Genealogy project, I was thrilled to find out that a Gregory Palamas, who lived long ago and was the Archbishop of Thessaloniki, apparently had a transitive relationship with people in science through an unbroken chain of mentoring (112861 “descendants” in total). I became curious, and wanted to find out which famous people he might be connected to.

While Palamas was the Archbishop of Thessaloniki he mentored Nilos Kabasilas (1298-1363), who later replaced him as Archbishop. Nilos in turn mentored Demetrios Kydones (1333-1397) and this lineage of mentoring continues in an unbroken line, through many scholars and countries, until we eventually arrive in Germany and at the famous mathematician Carl Friedrich Gauß in 1799.

Gauß himself mentored a few students, one of whom was Christian Ludwig Gerling (1788-1864), who went on to mentor Julius Plücker (1801-1868) and so forth. Again the chain of mentoring continues until we reach Marcos Vaz Salles, a Brazilian Tenure-Track Assistant Professor at the University of Copenhagen, which is the city I was born in… And here comes the surprising part, for me at least, because Marcos is now mentoring me, together with Professor Martin Zachariasen!

An unbroken line of guys mentoring guys:

  1. Gregory Palamas
  2. Nilos Kabasilas, 1363
  3. Demetrios Kydones
  4. Manuel Chrysoloras
  5. Guarino da Verona, 1408
  6. Vittorino da Feltre, Università di Padova, 1416
  7. Ognibene (Omnibonus Leonicenus) Bonisoli da Lonigo, Università di Mantova
  8. Niccolò Leoniceno, Medicinae Dr., Università di Padova, 1453
  9. Antonio Musa Brasavola, Medicinae Dr., Università degli Studi di Ferrara, 1520
  10. Gabriele Falloppio, Medicinae Dr., Università di Padova / Università degli Studi di Ferrara, 1547
  11. Hieronymus (Girolamo Fabrici d’Acquapendente) Fabricius, Medicinae Dr., Università di Padova, 1559
  12. Adriaan van den Spieghel, Medicinae Dr., Université Catholique de Louvain / Università di Padova, 1603
  13. Adolph Vorstius, Philosophiae Dr., Medicinae Dr., Universiteit Leiden / Università di Padova, 1619, 1622
  14. Franciscus de le Boë Sylvius, Medicinae Dr., Universiteit Leiden / Universität Basel, 1634, 1637
  15. Rudolf Wilhelm Krause, Medicinae Dr., Universiteit Leiden, 1671
  16. Simon Paul Hilscher, Medicinae Dr., Friedrich-Schiller-Universität Jena, 1704
  17. Johann Andreas Segner, Magister artium, Medicinae Dr. Friedrich-Schiller-Universität Jena, 1726, 1734
  18. Johann Georg Büsch, Magister, Georg-August-Universität Göttingen, 1752
  19. Johann Elert Bode, Handelsakademie Hamburg
  20. Johann Friedrich Pfaff, Dr. phil. Georg-August-Universität Göttingen 1786
  21. Carl Friedrich Gauß, Ph.D., Universität Helmstedt, 1799
  22. Christian Ludwig Gerling, Dr. phil., Georg-August-Universität Göttingen, 1812
  23. Julius Plücker, Ph.D., Philipps-Universität Marburg, 1823
  24. C. Felix (Christian) Klein, Dr. phil., Rheinische Friedrich-Wilhelms-Universität Bonn, 1868
  25. Philipp Furtwängler, Ph.D., Georg-August-Universität Göttingen, 1896
  26. Nikolaus Hofreiter, Dr. phil., Universität Wien, 1927
  27. Edmund Hlawka, Dr. phil., Universität Wien, 1938
  28. Hermann Adolf Maurer, Ph.D., Technische Universität Wien, 1965
  29. Hans-Peter Kriegel, Dr. rer. nat., Universität Fridericiana zu Karlsruhe, 1976
  30. Bernhard Seeger, Dr.-Ing., Universität Bremen, 1989
  31. Jens-Peter Dittrich, Dr. rer. nat., Philipps-Universität Marburg, 2002
  32. Marcos Antonio Vaz Salles, Ph.D., Eidgenössische Technische Hochschule Zürich, 2008
  33. Me, getting mentored in 2013

Daniel Grosu, an Associate Professor at Wayne State University has managed to track the mentor lineage through Palamas and even further back to John Mauropous (990-1092), who was a scholar at the University of Constantinople. He was a Byzantine Greek poet, hymnographer and author of letters and orations, living in the 11th century AD. And that is where the tale ends. For now.

Things related to Docker

Docker is a cool idea and open-source product, that seems to be taking the tech community by storm. Wired will tell you why it is cool in a story titled The Man Who Built a Computer the Size of the Internet.

The short version goes: Docker is a way to deploy and move applications with dependencies between Linux servers, using a container concept. The idea is similar to how applications are installed on a Mac, i.e. “everything in a single package”.

There are a number of supporting and related technologies, which I will now list:

  • Google Borg/Apache Mesos are a related technologies, and perhaps Borg is the original rolemodel for Docker. Borg is apparently being replaced by a new system codenamed Omega (video). According to a Wired Story, it influenced Twitter to develop Mesos (originally developed by researchers at the University of California at Berkeley), now Apache Mesos, to do a similar thing as Borg. It might be fair to say that Docker is an easy version of Borg/Mesos/Omega, for non-geniuses (people generally hired by Google, Twitter etc).
  • CoreOS is a supporting technology, an OS designed for deploying containers such as Docker. As mentioned in Wired, the project is based on Google’s ChromeOS. According to the website of this operating system, CoreOS is “Linux kernel + systemd. That’s about it.”

This is it for now about Docker. Just heard about it a few hours ago in an email from a friend and supervisor.

Watched the RAMCloud video

Today I watched a video on RAMCloud. I have made an index over the various sections of the video, with direct links. You’ll find this index in the bottom of this post.

“The RAMCloud project is creating a new class of storage, based entirely in DRAM, that is 2-3 orders of magnitude faster than existing storage systems”

Notable features (in my oppinion) a fast DRAM backed key-value interface with durability, fast recovery and the potential for adding transactions.

Also, RAMCloud is open source.