How to put each word in a file on a separate line

Place each word on a separate line with sed and awk:

sed -e 's/[^[:alpha:]]/ /g' | awk '{ for (i=1;i<=NF;i++) print $i }'

sed is used to replace non alpha characters with spaces (optional)

awk places each word on a separate line.

Taking it a step further you can keep only unique words with the good ol’ lowercase, sort -u trick or sort|uniq if you prefer that:

awk '{ for (i=1;i<=NF;i++) print $i }' | tr "[:upper:]" "[:lower:]" | sort -u

How to split a log file into smaller files

In this example I had a big log file (many million lines), that I wanted to split into smaller logfiles (each one million lines) for processing on Elastic MapReduce.

-rw-r--r--  1 kostas staff 543067012012 Oct 11 13:45 huge_logfile

This is a job for the split command. Because individual lines in the log file must be kept intact, the -l option is used to specify the number of lines in each file. In this example, certain lines are first filtered out with grep, to show how split is used when data is piped in:

grep 'some-pattern' huge_logfile | split -a 6 -l 1000000 - log_

The dash in the split command is used to accept input from standard input, while the log_ is used as a prefix for generated filenames. The -a 6 option tells split to use a 6 character extension after the prefix when naming files. The output looks like this:

huge_logfile
log_aaaaaa
log_aaaaab
log_aaaaac
log_aaaaad
log_aaaaae
...

OpenStreetMap tiles with custom projection and grid using Mapnik

Previous effort

In a previous post I tried generating OpenStreetMap tiles using GeoServer. It ended when I couldn’t find a good style (SLD) to apply the OSM layer.

The (new) idea, that also failed miserably

In this post I tried to use Mapnik to generating OSM map tiles in EPSG:25832. It failed mainly because the Python scripts published by OSM for generating tiles don’t support epsg:25832 out of the box. Mapnik is however an obvious choice for OSM because:

  • Mapnik can read .osm files
  • There is a comprehensive style for Mapnik, that is being maintained by OSM

Read more

Check if shared library is installed on Mac OS X

One way to check whether a library is installed is to use ld, e.g. check that zlib is installed:

ld -lzlib

If library is installed you’ll get something like this:

ld: warning: -arch not specified
ld: could not find entry point "start" (perhaps missing crt1.o) for inferred architecture x86_64

If library is not installed you’ll get something like this:

ld: library not found for -lzlib