Which files has a *nix process opened?

To list all files that are opened by a *nix process with a given pid, say 42, use the lsof command:

(sudo) lsof -p 42

Of course, a process may have many files open. To list only files that have a name containing “log”, use the grep command:

(sudo) lsof -p 42 | grep log

This of course assumes you know the process id (pid) of the process. To find the pid of processes with a given name, e.g. httpd, use the ps command together with the grep command:

# notice that this prints the grep process as well
ps aux | grep httpd

How long is the Doom Loop cycle currently?

Take a look at this Chomsky presentation, time it around 46:30. It seems that the most rational prediction would be that we are heading for another financial crisis, since financial systems are running a quote “Doom Loop”: Make huge gambles, make huge gains or fail. In the case of failure, get bailed out. This pattern of behaviour is rational, seen from the point of view of the financial sector, given the current environment. So, the good question is, what would the rational course of action be for us, the citizens, given that the financial sector is apparently acting, fully rationally, inside a Doom Loop?

The rational question would be, when is the next financial crisis coming? Given a good prediction of this point in time, how should we rationally act, e.g. in the real-estate market? If we should aspire to make rational decisions, we should not hope that another financial crisis will be avoided. We should expect it, and make rational decisions based upon it. For our own gain, if we so desire. Now, how do you do that? That is another question. It seems obvious that decisions in many areas should be influenced by this apparent fact, e.g. decisions in real-estate, entrepreneurship, family planning. If there is money to be made, somehow, in betting on the next financial crisis, maybe that would be the rational thing to do.

Must remember to salt my hashes

While a sha-256 hash may seem unbreakable, for many input strings it takes seconds to crack. If you don’t believe me, try the following or simply read this webpage:

$ python
>>> import hashlib
>>> print hashlib.sha256('megabrain').hexdigest()

Go to crackstation.net, paste the hash into the text area, click “crack hashes” and see it the admittedly super lame password cracked in a second. The basic concept behind the cracking is to precompute hashes for a lot of passwords, and doing reverse lookups – from hashcode to password. This way, it really does not make any difference how “good” your hashing algorithm is. This is not an attack against hashing algorithms, but an attack against common hashcodes. In the wild, you are more likely to encounter the hash of “megabrain” than the hash of “2f2f0a446f828f”. While you should encourage everybody you meet to choose strong passwords, it is perhaps more sustainable to strengthen the security around weak passwords.


This is where salts come in, as weak passwords can be made stronger by salting. A salt is just a sequence of bytes, e.g. “c039b8f8a8…” that you concatenate with a password before computing a password hash. It is ineffective to use the same salt for all passwords, so by all means read this page to get the inside scoop on how to do this correctly.

$ python
>>> os
>>> import hashlib
>>> password = 'megabrain'
>>> salt = os.urandom(32)
>>> stored_hash = hashlib.sha256(salt + password).hexdigest()

If you try to crack the stored_hash on crackstation.net, you will see that it is not successful. So the moral of the story is, a bad password + a good salt = a good password. Users only have to remember their (bad) password, while you should remember the good salt.

To authenticate as user with a salted password, you will again combine the salt and password before comparing to a stored hash:

>>> password_to_authenticate = 'megabrain'
>>> if hashlib.sha256(salt + password_to_authenticate).hexdigest() == stored_hash:
>>>     print "User has been authenticated!"
>>> else:
>>>     print "Wrong password!"

Docker on Ubuntu VM running on Mac using Vagrant

Docker allows you to develop, ship and run any application, anywhere. The metaphor is that of the standard shipping container that fits on any ship, can be handled by any crane, and loaded onto any train or truck.

In a previous post, I covered how to run Ubuntu on Mac using Vagrant. In this post, I will show how to run Docker on the Ubuntu box we got running with Vagrant.

I will cover how to:

Provisioning Docker on “vagrant up”

First, create a Vagrant setup like previously described. Then, edit the install.sh script, and enter some Docker installation commands:


curl -sSL https://get.docker.io/ | sh

Now, let’s test that docker was installed as intended:

vagrant up
vagrant ssh

(Fix) chown the docker socket:

# Now on vagrant machine
sudo chown vagrant /var/run/docker.sock  # TODO: need to address this issue in a different way

Check docker version:

docker version

Run a hello world:

docker pull ubuntu
docker run ubuntu echo "Hello, world"

Basic Docker usage

Get your applications into Docker containers


Shipping containers to team members


Deploying applications to production


Aside: Deploying containers on AWS



This post shows how to get up and running with Vagrant and Docker using the install scripts provided at get.docker.io. In the next post I will show how to use the “new” way to use Docker with Vagrant (thanks to Jens Roland for pointing me in the right direction).

Running Ubuntu on Mac with Vagrant

Vagrant is cool:

Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.


Vagrant stands on the shoulders of giants. Machines are provisioned on top of VirtualBox, VMware, AWS, or any other provider. Then, industry-standard provisioning tools such as shell scripts, Chef, or Puppet, can be used to automatically install and configure software on the machine.

In this post I’ll show you how to get started with Vagrant using a virtual Ubuntu Linux box. Moreover, I will cover how to use a simple provisioning technique (shell provisioning) for installing custom stuff into your virtual box on boot-up.

Vagrant basics

To follow along, you must first install Vagrant and VirtualBox. When you are done, cd to some folder (e.g. cd ~/Documents/trying-vagrant) and let’s get started:

Initialize Vagrant (“vagrant init”) using an Ubuntu Trusty Tahr image (https://vagrantcloud.com/ubuntu/trusty64/version/1/provider/virtualbox.box):

vagrant init ubuntu/trusty64

This creates a new file, Vagrantfile:

$ ls

Now, using the Vagrantfile that was created, boot the box (“vagrant up”). If this is the first time, vagrant will first download the image from the cloud (could take a while):

vagrant up

When done with the booting up, SSH into the machine (“vagrant ssh”):

vagrant ssh
# do some stuff, like ls and what not
^D  # to quit

Bring down the box (“vagrant destroy”). Oh I love this, can’t help myself but sound it out “deSTROY” in a super villain voice:

vagrant destroy

If this worked, let’s move on to installing some custom stuff on boot-up.

Installing stuff on “vagrant up”

There are many ways to install stuff on vagrant up, e.g. using shell scripts, Chef, or Puppet. Here, I will use a shell script because it is simple and clean.

Shell provisioning is a simple way to install stuff on “vagrant up”. First, let us create a shell script (“install.sh”) that we will later reference from the Vagrantfile. Furthermore, let’s live a little and install BrainFuck along with a hello world program.


sudo apt-get install bf
echo '++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.' > helloworld.b

(Remember to “chmod 744” the install.sh script). Now, add a few lines of code to your Vagrantfile and you’re golden. After the edit, the file should look like this.


# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"
  config.vm.provision "shell", path: "install.sh"

Now, let’s test that it worked:

vagrant up
vagrant ssh
# now on virtual machine:
$ bf helloworld.b
Hello World!


In this post, I showed you how to get started with Vagrant, and how to provision stuff on “vagrant up” using a shell script.

Screen capturing with PhantomJS

PhantomJS is a headless browsers that you can use, e.g. to test web UIs and to screen capture webpages. I will focus on the last use case.

Since PhantomJS knows how to execute Javascript, it can create a screen shot of most webpages, even those that render their part of their GUI using Javascript.


To get started with PhantomJS, download and unzip a PhantomJS binary for your system. In the unzip’ed directory structure you’ll find bin/phantomjs, which is ready to use binary program. You can add that directory to your PATH if you like.

PhantomJS is controlled by Javascript. The script rasterize.js is a useful multi-purpose script for creating screen shots. We will use this script, so download and store it somewhere convenient.

Hello world

I have created a simple test page that partly produces the page content using Javascript. If Javascript is enabled, the page will read “Hello Javascript”. Otherwise, the page reads “Hello”. Let us now screen capture this page using PhantomJS:

# Copy paste everything into a terminal window and run it
# You need to specify the right paths to:
# - phantomjs (e.g. add phantom "bin" dir to PATH)
# - rasterize.js (e.g. run below command in dir containing script)
phantomjs rasterize.js http://skipperkongen.dk/files/hello_javascript.html hello_javascript.pdf

If that went well, you should now have a PDF file called hello_javascript.pdf in the directory where you ran the command. Open the PDF and confirm that it contains the text “Hello Javascript” just like the web page does.

Screen capturing a real blog post

Hopefully, the above experiment worked. However, the content in the generated PDF was not too interesting. Let’s repeat the above experiment with a real blog post, namely the first blog post I ever wrote on skipperkongen.dk:

# Copy paste everything into a terminal window and run it
# You need to specify the right paths to:
# - phantomjs (e.g. add phantom "bin" dir to PATH)
# - rasterize.js (e.g. run below command in dir containing script)
phantomjs rasterize.js \
http://skipperkongen.dk/2010/11/14/hard-to-less-hard/ skipperkongen.pdf

If you open the generated PDF you will see that it is not the prettiest sight. The PDF has only a passing resemblance to what the original blog post looks like if you open it in a “normal” browser. This is perhaps all according to specifications, but I (and I’m guessing you) would like a more aesthetically pleasing result.

Inspecting the generated PDF

Before we begin to understand why the generated PDF looks in a particular way, let us describe what we are seeing. So what does the PDF look like?

First, the generated PDF is missing the content header found on the web page. Second, the rendered PDF has an incredibly narrow page layout or uses a very big font size. Third, on my Mac there is a weird “private use” symbol in several places in the pdf. Regarding the third issue, there is a fun discussion over at StackExchange for Mac OS X about the “private use” symbol with some interesting background information.

Why does the generated PDF look this way?

In order to understand why PhantomJS renders a page in a certain way, it is relevant to look at the following pages:

There is honestly not a lot of content there, so let’s try to analyze the issues ourselves. Regarding the missing header, the HTML source code for the blog post specifies a “print” CSS style with the following CSS definition:

<style type="text/css" media="print">#wpadminbar { display:none; }</style>

Regarding the missing content header, tt seems that PhantomJS uses the “print” CSS style if available when generating a PDF.

Regarding the narrow layout, recall that we used rasterize.js as the control script for phantomjs. The code in the script will have a big impact on what we are seeing, which could include layout. Inside the rasterize.js script we find the following line:

page.viewportSize = { width: 600, height: 600 };

That partly explains the narrow layout. If we change these settings to width: 1800 and height: 1000 in a copy of the file (rasterize2.js) and rerun the screen capture we get a wider PDF canvas. However, the actual content layout is only partly fixed by this. A full solution will require more, e.g. working with the page CSS.

In the next part of this post, I’ll dig more into the PhantomJS API.

Twitter HyperLogLog monoids in Spark

Want to count unique elements in a stream without blowing up memory? In more specific words, do you want to use a HyperLogLog counter in Spark? Until today, I’d never heard the word “monoid” before. However, Twitter Algebird is a project that contains a collection of monoids including a HyperLogLog monoid, which can be used to aggregate a stream into unique elements. The code looks like this:

import com.twitter.algebird._
val aggregator = new HyperLogLogMonoid(12)
inputData.reduceByKey(aggregator.plus(_, _))

This young man tells you all about it, and then some:

The video also mentions another Twitter project, the Storehaus project, which can be used to integrate Spark with a lot of NoSQL databases like DynamoDB. Looks very useful indeed.

And just to go completely crazy with the Twitter project references, the talk also brings on Summingbird. The Twitter team has a separate blog post
about using Summingbird with Spark Streaming.

Easiest way to install a PostgreSQL/PostGIS database on Mac

Installing Postgres+PostGIS has never been easier on Mac. In fact, it is now an app! You download the app-file from postgresapp.com, place it in your Applications folder, and you’re done. Really.

If you think that was over too fast

If you think that was over too fast, there is one more thing you can do. Add the postgreapp “bin” directory to PATH.

vi ~/.bash_profile
# add line: export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/9.3/bin

Next time you open terminal you will be able to execute all of the following commands:


clusterdb createdb createlang createuser dropdb droplang
dropuser ecpg initdb oid2name pg_archivecleanup 
pg_basebackup pg_config pg_controldata pg_ctl pg_dump 
pg_dumpall pg_receivexlog pg_resetxlog pg_restore 
pg_standby pg_test_fsync pg_test_timing pg_upgrade 
pgbench postgres postmaster psql reindexdb vacuumdb 


cs2cs geod invgeod invproj nad2bin proj


gdal_contour gdal_grid gdal_rasterize gdal_translate 
gdaladdo gdalbuildvrt gdaldem gdalenhance gdalinfo 
gdallocationinfo gdalmanage gdalserver gdalsrsinfo 
gdaltindex gdaltransform gdalwarp nearblack ogr2ogr 
ogrinfo ogrtindex testepsg


pgsql2shp raster2pgsql shp2pgsql

That is pretty f’ing awesome!!

Poor man’s wget

The command wget is useful, but unfortunately doesn’t come preinstalled with Mac. Yeah, you can install it of course, but if you’re doing it from source, the process has a few steps to satisfy all the dependencies; start by configure make‘ing the wget source and work your was backwards until ./configure runs for your wget source without hiccups.

This is how to get a poor mans wget, or simply realize that you can use curl -O, unless you’re getting content via https.

alias wget="curl -O"