How to merge two disjoint random samples?

The problem: Given two random samples, s1 and s2, of size k over two disjoint populations, p1 and p2, how to combine the two k-sized random samples into one k-sized random sample over p1 ∪ p2?

The solution: k times, draw an element s1 ∪ s2; with probability d1 = |p1| / |p1 ∪ p2|, draw the next element from p1; with probability d2 = 1 – d1 draw the next element from p2.

(the solution was on stackoverflow)

In python:

import random
import numpy
# sizes
e1 = 1000
e2 = 1000000
# populations
p1 = xrange(e1)
p2 = xrange(e1, e2)
# sample size
k = 500
# random samples
s1 = random.sample(p1, k)
s2 = random.sample(p2, k)
# merge samples
merge = []
for i in range(k):
  if s1 and s2:
    merge.append(s1.pop() if random.random < len(p1) / float(len(p1)+len(p2)) else s2.pop())
  elif s1:
# Validate
hist = numpy.histogram(merge, bins=[0,500000,1000000])
# The two bins should be roughly equal, i.e. the error should be small.
print abs(hist[0][0] - hist[0][1]) / float(k)
# alternatively, use filter to count values below 500K
print abs(len(filter(lambda x: x<500000, merge)) - 250) / 500.0

How to compute Fibonacci sequence in SQL

Inspired and simplified from a set of slides on using RDBMS for storing, managing, and querying graphs:

WITH recursive fib(i,j) AS (
    SELECT 0,1
    SELECT j, i+j FROM fib WHERE j<1000

Which files has a *nix process opened?

To list all files that are opened by a *nix process with a given pid, say 42, use the lsof command:

(sudo) lsof -p 42

Of course, a process may have many files open. To list only files that have a name containing “log”, use the grep command:

(sudo) lsof -p 42 | grep log

This of course assumes you know the process id (pid) of the process. To find the pid of processes with a given name, e.g. httpd, use the ps command together with the grep command:

# notice that this prints the grep process as well
ps aux | grep httpd

How long is the Doom Loop cycle currently?

Take a look at this Chomsky presentation, time it around 46:30. It seems that the most rational prediction would be that we are heading for another financial crisis, since financial systems are running a quote “Doom Loop”: Make huge gambles, make huge gains or fail. In the case of failure, get bailed out. This pattern of behaviour is rational, seen from the point of view of the financial sector, given the current environment. So, the good question is, what would the rational course of action be for us, the citizens, given that the financial sector is apparently acting, fully rationally, inside a Doom Loop?

The rational question would be, when is the next financial crisis coming? Given a good prediction of this point in time, how should we rationally act, e.g. in the real-estate market? If we should aspire to make rational decisions, we should not hope that another financial crisis will be avoided. We should expect it, and make rational decisions based upon it. For our own gain, if we so desire. Now, how do you do that? That is another question. It seems obvious that decisions in many areas should be influenced by this apparent fact, e.g. decisions in real-estate, entrepreneurship, family planning. If there is money to be made, somehow, in betting on the next financial crisis, maybe that would be the rational thing to do.

Must remember to salt my hashes

While a sha-256 hash may seem unbreakable, for many input strings it takes seconds to crack. If you don’t believe me, try the following or simply read this webpage:

$ python
>>> import hashlib
>>> print hashlib.sha256('megabrain').hexdigest()

Go to, paste the hash into the text area, click “crack hashes” and see it the admittedly super lame password cracked in a second. The basic concept behind the cracking is to precompute hashes for a lot of passwords, and doing reverse lookups – from hashcode to password. This way, it really does not make any difference how “good” your hashing algorithm is. This is not an attack against hashing algorithms, but an attack against common hashcodes. In the wild, you are more likely to encounter the hash of “megabrain” than the hash of “2f2f0a446f828f”. While you should encourage everybody you meet to choose strong passwords, it is perhaps more sustainable to strengthen the security around weak passwords.


This is where salts come in, as weak passwords can be made stronger by salting. A salt is just a sequence of bytes, e.g. “c039b8f8a8…” that you concatenate with a password before computing a password hash. It is ineffective to use the same salt for all passwords, so by all means read this page to get the inside scoop on how to do this correctly.

$ python
>>> os
>>> import hashlib
>>> password = 'megabrain'
>>> salt = os.urandom(32)
>>> stored_hash = hashlib.sha256(salt + password).hexdigest()

If you try to crack the stored_hash on, you will see that it is not successful. So the moral of the story is, a bad password + a good salt = a good password. Users only have to remember their (bad) password, while you should remember the good salt.

To authenticate as user with a salted password, you will again combine the salt and password before comparing to a stored hash:

>>> password_to_authenticate = 'megabrain'
>>> if hashlib.sha256(salt + password_to_authenticate).hexdigest() == stored_hash:
>>>     print "User has been authenticated!"
>>> else:
>>>     print "Wrong password!"

Docker on Ubuntu VM running on Mac using Vagrant

Docker allows you to develop, ship and run any application, anywhere. The metaphor is that of the standard shipping container that fits on any ship, can be handled by any crane, and loaded onto any train or truck.

In a previous post, I covered how to run Ubuntu on Mac using Vagrant. In this post, I will show how to run Docker on the Ubuntu box we got running with Vagrant.

I will cover how to:

Provisioning Docker on “vagrant up”

First, create a Vagrant setup like previously described. Then, edit the script, and enter some Docker installation commands:

curl -sSL | sh

Now, let’s test that docker was installed as intended:

vagrant up
vagrant ssh

(Fix) chown the docker socket:

# Now on vagrant machine
sudo chown vagrant /var/run/docker.sock  # TODO: need to address this issue in a different way

Check docker version:

docker version

Run a hello world:

docker pull ubuntu
docker run ubuntu echo "Hello, world"

Basic Docker usage

Get your applications into Docker containers


Shipping containers to team members


Deploying applications to production


Aside: Deploying containers on AWS



This post shows how to get up and running with Vagrant and Docker using the install scripts provided at In the next post I will show how to use the “new” way to use Docker with Vagrant (thanks to Jens Roland for pointing me in the right direction).

Running Ubuntu on Mac with Vagrant

Vagrant is cool:

Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.


Vagrant stands on the shoulders of giants. Machines are provisioned on top of VirtualBox, VMware, AWS, or any other provider. Then, industry-standard provisioning tools such as shell scripts, Chef, or Puppet, can be used to automatically install and configure software on the machine.

In this post I’ll show you how to get started with Vagrant using a virtual Ubuntu Linux box. Moreover, I will cover how to use a simple provisioning technique (shell provisioning) for installing custom stuff into your virtual box on boot-up.

Vagrant basics

To follow along, you must first install Vagrant and VirtualBox. When you are done, cd to some folder (e.g. cd ~/Documents/trying-vagrant) and let’s get started:

Initialize Vagrant (“vagrant init”) using an Ubuntu Trusty Tahr image (

vagrant init ubuntu/trusty64

This creates a new file, Vagrantfile:

$ ls

Now, using the Vagrantfile that was created, boot the box (“vagrant up”). If this is the first time, vagrant will first download the image from the cloud (could take a while):

vagrant up

When done with the booting up, SSH into the machine (“vagrant ssh”):

vagrant ssh
# do some stuff, like ls and what not
^D  # to quit

Bring down the box (“vagrant destroy”). Oh I love this, can’t help myself but sound it out “deSTROY” in a super villain voice:

vagrant destroy

If this worked, let’s move on to installing some custom stuff on boot-up.

Installing stuff on “vagrant up”

There are many ways to install stuff on vagrant up, e.g. using shell scripts, Chef, or Puppet. Here, I will use a shell script because it is simple and clean.

Shell provisioning is a simple way to install stuff on “vagrant up”. First, let us create a shell script (“”) that we will later reference from the Vagrantfile. Furthermore, let’s live a little and install BrainFuck along with a hello world program.

sudo apt-get install bf
echo '++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.' > helloworld.b

(Remember to “chmod 744” the script). Now, add a few lines of code to your Vagrantfile and you’re golden. After the edit, the file should look like this.


# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| = "ubuntu/trusty64"
  config.vm.provision "shell", path: ""

Now, let’s test that it worked:

vagrant up
vagrant ssh
# now on virtual machine:
$ bf helloworld.b
Hello World!


In this post, I showed you how to get started with Vagrant, and how to provision stuff on “vagrant up” using a shell script.

Starting a web server and other PHP tricks

Start PHP webserver (in current directory):

php -S localhost:8080  # starts http server on port 8080

Start PHP prompt (with illustrating example):

php -a
php >  echo base64_decode('QWxhZGRpbjpvcGVuIHNlc2FtZQ==');
Aladdin:open sesame

Screen capturing with PhantomJS

PhantomJS is a headless browsers that you can use, e.g. to test web UIs and to screen capture webpages. I will focus on the last use case.

Since PhantomJS knows how to execute Javascript, it can create a screen shot of most webpages, even those that render their part of their GUI using Javascript.


To get started with PhantomJS, download and unzip a PhantomJS binary for your system. In the unzip’ed directory structure you’ll find bin/phantomjs, which is ready to use binary program. You can add that directory to your PATH if you like.

PhantomJS is controlled by Javascript. The script rasterize.js is a useful multi-purpose script for creating screen shots. We will use this script, so download and store it somewhere convenient.

Hello world

I have created a simple test page that partly produces the page content using Javascript. If Javascript is enabled, the page will read “Hello Javascript”. Otherwise, the page reads “Hello”. Let us now screen capture this page using PhantomJS:

# Copy paste everything into a terminal window and run it
# You need to specify the right paths to:
# - phantomjs (e.g. add phantom "bin" dir to PATH)
# - rasterize.js (e.g. run below command in dir containing script)
phantomjs rasterize.js hello_javascript.pdf

If that went well, you should now have a PDF file called hello_javascript.pdf in the directory where you ran the command. Open the PDF and confirm that it contains the text “Hello Javascript” just like the web page does.

Screen capturing a real blog post

Hopefully, the above experiment worked. However, the content in the generated PDF was not too interesting. Let’s repeat the above experiment with a real blog post, namely the first blog post I ever wrote on

# Copy paste everything into a terminal window and run it
# You need to specify the right paths to:
# - phantomjs (e.g. add phantom "bin" dir to PATH)
# - rasterize.js (e.g. run below command in dir containing script)
phantomjs rasterize.js \ skipperkongen.pdf

If you open the generated PDF you will see that it is not the prettiest sight. The PDF has only a passing resemblance to what the original blog post looks like if you open it in a “normal” browser. This is perhaps all according to specifications, but I (and I’m guessing you) would like a more aesthetically pleasing result.

Inspecting the generated PDF

Before we begin to understand why the generated PDF looks in a particular way, let us describe what we are seeing. So what does the PDF look like?

First, the generated PDF is missing the content header found on the web page. Second, the rendered PDF has an incredibly narrow page layout or uses a very big font size. Third, on my Mac there is a weird “private use” symbol in several places in the pdf. Regarding the third issue, there is a fun discussion over at StackExchange for Mac OS X about the “private use” symbol with some interesting background information.

Why does the generated PDF look this way?

In order to understand why PhantomJS renders a page in a certain way, it is relevant to look at the following pages:

There is honestly not a lot of content there, so let’s try to analyze the issues ourselves. Regarding the missing header, the HTML source code for the blog post specifies a “print” CSS style with the following CSS definition:

<style type="text/css" media="print">#wpadminbar { display:none; }</style>

Regarding the missing content header, tt seems that PhantomJS uses the “print” CSS style if available when generating a PDF.

Regarding the narrow layout, recall that we used rasterize.js as the control script for phantomjs. The code in the script will have a big impact on what we are seeing, which could include layout. Inside the rasterize.js script we find the following line:

page.viewportSize = { width: 600, height: 600 };

That partly explains the narrow layout. If we change these settings to width: 1800 and height: 1000 in a copy of the file (rasterize2.js) and rerun the screen capture we get a wider PDF canvas. However, the actual content layout is only partly fixed by this. A full solution will require more, e.g. working with the page CSS.

In the next part of this post, I’ll dig more into the PhantomJS API.