How to randomly sample k lines from a file in *nix

You can use the shell to extract a random sample of lines from a file in *nix. The two commands you need are “shuf” and “head” (+ “tail” for CSV files with a header). The shuf command will randomly shuffle all the lines of its input. The head command will cut of the input after the first k lines. Examples for both general files and CSV files are given below.

General pattern

To randomly sample 100 lines from any file in *nix:

shuf INPUT_FILE | head -n 100 > SAMPLE_FILE

Pattern for CSV

If you file is a CSV file, you probably want to extract the header and only sample the body. You can use the head and tail commands, respectively, to extract the header and sample the contents of the CSV file.

Extract the header of the CSV file:

head -1 INPUT_FILE.csv > SAMPLE_FILE.csv

Sample 100 lines from the body of the CSV file and append to sample file (notice “>” above versus “>>” below):

tail +2 INPUT_FILE.csv | shuf | head -100 >> SAMPLE_FILE.csv

Install dependencies on Mac

On Mac, the shuf command is not shipped with the OS. You can get it via brew. It will be named “gshuf”:

brew install coreutils

So, on Mac you should replace shuf with gshuf in the example above.

Log devices on your network

Fing logger (finglogger.sh):

#!/bin/sh
 
FING_LOG_FILE=/path/to/fing.log
 
# append current public ip
echo `date +"%Y/%m/%d %T"`";publicip;"`curl -s ipecho.net/plain`";;;" >> $FING_LOG_FILE
 
# append current fing output
/usr/bin/fing -r1 -o log,csv,$FING_LOG_FILE,1000 --silent

Add to cron (run every hour):

0 * * * * /path/to/finglogger.sh

Which files has a *nix process opened?

To list all files that are opened by a *nix process with a given pid, say 42, use the lsof command:

(sudo) lsof -p 42

Of course, a process may have many files open. To list only files that have a name containing “log”, use the grep command:

(sudo) lsof -p 42 | grep log

This of course assumes you know the process id (pid) of the process. To find the pid of processes with a given name, e.g. httpd, use the ps command together with the grep command:

# notice that this prints the grep process as well
ps aux | grep httpd

Docker on Ubuntu VM running on Mac using Vagrant

Docker allows you to develop, ship and run any application, anywhere. The metaphor is that of the standard shipping container that fits on any ship, can be handled by any crane, and loaded onto any train or truck.

In a previous post, I covered how to run Ubuntu on Mac using Vagrant. In this post, I will show how to run Docker on the Ubuntu box we got running with Vagrant.

I will cover how to:

Provisioning Docker on “vagrant up”

First, create a Vagrant setup like previously described. Then, edit the install.sh script, and enter some Docker installation commands:

install.sh:

#!/bin/sh
curl -sSL https://get.docker.io/ | sh

Now, let’s test that docker was installed as intended:

vagrant up
vagrant ssh

(Fix) chown the docker socket:

# Now on vagrant machine
sudo chown vagrant /var/run/docker.sock  # TODO: need to address this issue in a different way

Check docker version:

docker version

Run a hello world:

docker pull ubuntu
docker run ubuntu echo "Hello, world"

Basic Docker usage

Get your applications into Docker containers

TODO

Shipping containers to team members

TODO

Deploying applications to production

TODO

Aside: Deploying containers on AWS

HOW TO

Summary

This post shows how to get up and running with Vagrant and Docker using the install scripts provided at get.docker.io. In the next post I will show how to use the “new” way to use Docker with Vagrant (thanks to Jens Roland for pointing me in the right direction).

Running Ubuntu on Mac with Vagrant

Vagrant is cool:

Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.

Furthermore:

Vagrant stands on the shoulders of giants. Machines are provisioned on top of VirtualBox, VMware, AWS, or any other provider. Then, industry-standard provisioning tools such as shell scripts, Chef, or Puppet, can be used to automatically install and configure software on the machine.

In this post I’ll show you how to get started with Vagrant using a virtual Ubuntu Linux box. Moreover, I will cover how to use a simple provisioning technique (shell provisioning) for installing custom stuff into your virtual box on boot-up.

Vagrant basics

To follow along, you must first install Vagrant and VirtualBox. When you are done, cd to some folder (e.g. cd ~/Documents/trying-vagrant) and let’s get started:

Initialize Vagrant (“vagrant init”) using an Ubuntu Trusty Tahr image (https://vagrantcloud.com/ubuntu/trusty64/version/1/provider/virtualbox.box):

vagrant init ubuntu/trusty64

This creates a new file, Vagrantfile:

$ ls
Vagrantfile

Now, using the Vagrantfile that was created, boot the box (“vagrant up”). If this is the first time, vagrant will first download the image from the cloud (could take a while):

vagrant up

When done with the booting up, SSH into the machine (“vagrant ssh”):

vagrant ssh
# do some stuff, like ls and what not
^D  # to quit

Bring down the box (“vagrant destroy”). Oh I love this, can’t help myself but sound it out “deSTROY” in a super villain voice:

vagrant destroy

If this worked, let’s move on to installing some custom stuff on boot-up.

Installing stuff on “vagrant up”

There are many ways to install stuff on vagrant up, e.g. using shell scripts, Chef, or Puppet. Here, I will use a shell script because it is simple and clean.

Shell provisioning is a simple way to install stuff on “vagrant up”. First, let us create a shell script (“install.sh”) that we will later reference from the Vagrantfile. Furthermore, let’s live a little and install BrainFuck along with a hello world program.

install.sh:

#!/bin/sh
sudo apt-get install bf
echo '++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.' > helloworld.b

(Remember to “chmod 744” the install.sh script). Now, add a few lines of code to your Vagrantfile and you’re golden. After the edit, the file should look like this.

Vagrantfile:

# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"
  config.vm.provision "shell", path: "install.sh"
end

Now, let’s test that it worked:

vagrant up
vagrant ssh
# now on virtual machine:
$ bf helloworld.b
Hello World!

Summary

In this post, I showed you how to get started with Vagrant, and how to provision stuff on “vagrant up” using a shell script.

Easiest way to install a PostgreSQL/PostGIS database on Mac

Installing Postgres+PostGIS has never been easier on Mac. In fact, it is now an app! You download the app-file from postgresapp.com, place it in your Applications folder, and you’re done. Really.

If you think that was over too fast

If you think that was over too fast, there is one more thing you can do. Add the postgreapp “bin” directory to PATH.

vi ~/.bash_profile
 
# add line: export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/9.3/bin

Next time you open terminal you will be able to execute all of the following commands:

PostgreSQL:

clusterdb createdb createlang createuser dropdb droplang
dropuser ecpg initdb oid2name pg_archivecleanup 
pg_basebackup pg_config pg_controldata pg_ctl pg_dump 
pg_dumpall pg_receivexlog pg_resetxlog pg_restore 
pg_standby pg_test_fsync pg_test_timing pg_upgrade 
pgbench postgres postmaster psql reindexdb vacuumdb 
vacuumlo

PROJ.4:

cs2cs geod invgeod invproj nad2bin proj

GDAL:

gdal_contour gdal_grid gdal_rasterize gdal_translate 
gdaladdo gdalbuildvrt gdaldem gdalenhance gdalinfo 
gdallocationinfo gdalmanage gdalserver gdalsrsinfo 
gdaltindex gdaltransform gdalwarp nearblack ogr2ogr 
ogrinfo ogrtindex testepsg

PostGIS:

pgsql2shp raster2pgsql shp2pgsql

That is pretty f’ing awesome!!

Poor man’s wget

The command wget is useful, but unfortunately doesn’t come preinstalled with Mac. Yeah, you can install it of course, but if you’re doing it from source, the process has a few steps to satisfy all the dependencies; start by configure make‘ing the wget source and work your was backwards until ./configure runs for your wget source without hiccups.

This is how to get a poor mans wget, or simply realize that you can use curl -O, unless you’re getting content via https.

alias wget="curl -O"

Playing with GraphViz and MathGenealogy data

Math in Genealogy is a great project (donate online). Sven Köhler from Potsdam, Germany has written a python script for visualizing the database, which I’m going to try.

First step is to clone the git repo:

$ git clone git@github.com:tzwenn/MathGenealogy.git

His instructions are quite simple:

$ ./genealogy.py --search 38586  # 30 seconds
$ ./genealogy.py --display 38586 > euler.dot  # 0.1 seconds

Next step is to install e.g. GraphViz, which is needed to visualize the dot file as a graph. Go to the download page for GraphViz, and follow instructions for your OS.

This should install the commandline tool also. Now you can visualize Leonard Euler’s supervisor family tree (direct descendants) like this:

$ dot euler.dot -Tpng -o euler.png

Looking at the database is easy. Every invocation of ./genealogy.py –search writes to a sqlite3 database file (genealogy.db).

$ sqlite3 genealogy.db

This opens up a prompt. Have a look at the schema of the database like this:

sqlite> .schema

And see what is inside the thesis table like this:

sqlite> select * from thesis;