The scale of the Danish cyber effort

How much money does Denmark spend on cyber defense, compared to the U.S? In total and per citizen. This is what I’ll look at in this post. I’ll also try to get an initial idea of what is going on. Why am I doing this? Actually, just out of curiosity, and to kill some time before I have my hair cut.

Picking up the paper-paper (Politiken) this morning I read a short opinion-piece about the intelligence branch of the danish armed forced (FE: Forsvarets Efterretningstjeneste), and in particular the new Center for Cybersecurity. The concern is that this new center is going to spy on ordinary Danish citizens (NSA-style). It made me curious, and I decided to investigate for myself.

Web soldiers during combat. Not entirely sure that’s not World of Warcraft.

In 2011 the center was established, with a fairly modest annual budget of 35 million DKK a year (out of a 90 million DKK budget in 2014 for cyber efforts by the Danish Ministry of Defense; increased to 150 million DKK in 2016). This is a modest budget, given the amount of money truly skilled IT-professionals charge an hour and what IT equipment costs in general, but also compared to what other institutions in Denmark receive. For example the Danish Geodata Agency, which I’ve had the great pleasure of working for, has annual budget of more than 200 million DKK.

So 90 million DKK for cyber defense versus 200 million DKK for geographical data (2014).

In the United States, the Defense Department allocates $4.7 billion on the annual budget for “cyber efforts”. Making the currency conversion, that is 25 billion DKK versus 90 million DKK, a ration of 277:1.

Red square is Danish budget, blue square is U.S. budget:

The population of the United States is 313 million people. The population ratio between the U.S. and Denmark is approximately 62:1. The United States thus spends roughly 4.5 times more money per capita on cyber efforts than Denmark.

Dollars spent on cyber efforts per person in the U.S and in Denmark:

When trying to understand the motivation for national cyber efforts, Danish independent media seems to focus on the threat posed by industrial espionage (from abroad?) against Danish companies (1, 2, 3). This is surely a real threat, and should be a primary mission IMO.

The stated mission of the center, as described on the homepage for the center, is a bit more vague. It goes something like this:

Styrke Danmarks modstandsdygtighed mod trusler rettet mod samfundsvigtig informations- og kommunikationsteknologi (ikt); Sikre forudsætningerne for en robust ikt-infrastruktur i Danmark; Varsle om og imødegå cyberangreb med henblik på at styrke beskyttelsen af danske interesser.

I’m not really sure what that means concretely. What the paper-paper (Politiken) is concerned about is that the center is going to spy on Danish and foreign citizens. Given the modest annual budget and the usual burn-rate in public administration, I think this is going to be a rather weak threat to our privacy. Another question is, what should the primary mission of the center be, and how should that mission be accomplished? In any event, 90 million DKK do not go a long way towards anything. That being said, I’d certainly curious about what the money IS spent on. If I learn, I’m not sure I’ll post it on my personal blog, so don’t hold your breath.

This was primarily a way to pass some time before I have my hair cut (in five minutes).

Finding the haystack

Indtil denne sommer troede mange, at de kendte til sandheden om, at det er svært at finde nålen i høstakken. Kun de færreste vidste, at man har vendt ordsproget på hovedet i efterretningsverdenen. Her siger man, at »for at finde nålen, har vi brug for høstakken«.

http://www.information.dk/471968

Writing a parser in Python

This is my base pattern for writing a parser in Python by using the pyparsing library. It is slightly more complicated than a hello world in pyparsing, but I think it is more useful as a small example of writing a parser for a real grammar.

A base class PNode is used to provide utility functions to classes implementing parse tree nodes, e.g. turning a parse tree into the original string (except all whitespace is replaced by single space). It assumes that tokens in the input where separated by whitespace, and that all whitespace is the same.

For a particular grammar, I use Python classes to represent nodes in the parse tree; these classes get created by calling the setParseAction method on the corresponding BNF element. I like having these classes because it adds a nice structure to the parse tree.

from pyparsing import *
 
class PNode(object):
    """Base class for parser elements"""
    def __init__(self, tokens):
        super(PNode, self).__init__()
        self.tokens = tokens
 
    def __str__(self):
        return u" ".join(map(lambda x: unicode(x), self.tokens))
 
    def __repr__(self):
        return self.__str__()
 
# Target classes
 
class Integer(PNode):
    def __init__(self, tokens):
        super(Integer, self).__init__(tokens)
        self.value = int(tokens[0])
 
class Comma(PNode):
    def __init__(self, tokens):
        super(Comma, self).__init__(tokens)
 
class IntegerList(PNode):
    def __init__(self, tokens):
        super(IntegerList, self).__init__(tokens)
        self.integers = filter(lambda x: type(x) == Integer, tokens)
        #pdb.set_trace()
        #self.foo = 'bar'
 
# BNF
 
comma = Literal(',').setParseAction(Comma)
integer = Word(nums).setParseAction(Integer)
integer_list = (integer + ZeroOrMore(comma + integer)).setParseAction(IntegerList)
 
bnf = integer_list
bnf += StringEnd()
 
# Try parser
 
parsed_list = bnf.parseString('1,2,3')[0]
 
print parsed_list

When to be most careful about catching the flu?

Continuing on my blogification of Peter Norvigs excellent talk, the question is, when to watch out for the flu, e.g. if you live in Denmark?

1) Go to www.google.com/trends/
2) Type in the word “influenza”
3) Select your geographical region (Denmark in my case)
4) See data up to year 2008, to avoid the graph being squished by the outbreak of A(H1N1) (which leads to unusually many people talking about the flu)

Turns out the answer is: watch out in October and February.

Geocoding Python function for PostgreSQL

Gratefully making use of what others have provided, i.e. geopy, Google and plpythonu.

Type to hold result of geocoding:

CREATE TYPE geocoding AS (
  place text,
  latitude DOUBLE PRECISION,
  longitude DOUBLE PRECISION
);

Function that does the actual geocoding (to be extended with more vendors. Hint: look at geopy wiki). Takes an (arbitrary) input string to be geocoded:

CREATE OR REPLACE FUNCTION python_geocode
(
  input text,
  vendor text DEFAULT 'google'
) RETURNS SETOF geocoding AS
$$
  import time
  from geopy import geocoders
  # https://code.google.com/p/geopy/wiki/GettingStarted
 
  time.sleep(0.2)
  # TODO: Add other available vendors, e.g. Yahoo.
  if vendor.lower() == 'google':
    geocoder = geocoders.GoogleV3()
  else:
    raise ValueError("Invalid geocoder: %s" % vendor)
  try:
    for res in geocoder.geocode(input, exactly_one=False):
      yield {'place': res[0], 'latitude': res[1][0], 'longitude': res[1][1]}
  except:
    pass
$$ LANGUAGE plpythonu VOLATILE;

Example:

SELECT place, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)
FROM python_geocode('Kostas');

Playing with GraphViz and MathGenealogy data

Math in Genealogy is a great project (donate online). Sven Köhler from Potsdam, Germany has written a python script for visualizing the database, which I’m going to try.

First step is to clone the git repo:

$ git clone git@github.com:tzwenn/MathGenealogy.git

His instructions are quite simple:

$ ./genealogy.py --search 38586  # 30 seconds
$ ./genealogy.py --display 38586 > euler.dot  # 0.1 seconds

Next step is to install e.g. GraphViz, which is needed to visualize the dot file as a graph. Go to the download page for GraphViz, and follow instructions for your OS.

This should install the commandline tool also. Now you can visualize Leonard Euler’s supervisor family tree (direct descendants) like this:

$ dot euler.dot -Tpng -o euler.png

Looking at the database is easy. Every invocation of ./genealogy.py –search writes to a sqlite3 database file (genealogy.db).

$ sqlite3 genealogy.db

This opens up a prompt. Have a look at the schema of the database like this:

sqlite> .schema

And see what is inside the thesis table like this:

sqlite> select * from thesis;