Checking for disc overload on linux

The trick is to use vmstat and look at the number for “wait on acknowledge”.

vmstat -S m 1 100

Look under “cpu” and the “wa” column. If this number is high, it is bad. It should be zero. Some sample output from vmstat looks like this:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0    406     83   3062    0    0     1     5    1    4  0  0 99  1
 0  0      0    406     83   3062    0    0     0     0 2020   49  0  0 100  0
 0  0      0    406     83   3062    0    0     0     0 2020   41  0  0 100  0
 0  0      0    406     83   3062    0    0     0     0 2018   53  0  0 100  0

Look at the column furthest to the right.

OpenStreetMap tiles in EPSG:25832 projection using GeoServer

Warning: This is a description of how to create a OpenStreetMap WMS with GeoServer. It works fine up to the point where the layers published as an unstyled WMS. This is where I’ve not been able to produce a good result, because of lack of a good Styled Layer Descriptor (SLD). If you have hints about a good SLD, feel welcome to submit a comment!

The idea

The idea is/was to first create a good general purpose OpenStreetMap WMS, and then use GeoWebCache to generate tiles from this WMS source in a custom projection, epsg:25832 in our case.

Continue reading “OpenStreetMap tiles in EPSG:25832 projection using GeoServer”

Finding the most quoted main author using linux command line

I have a text file containing article references. It looks like this

- Miller HJ (2004) Tobler’s First Law and spatial analysis. Ann Assoc Am Geogr 94:284–289.
 
- Onsrud H, ed (2007) Research and Theory in Advanced Spatial Data Infrastructure Concepts (ESRI Press, Redlands, CA).
 
- Egenhofer M (2002) Toward the geospatial semantic web. Advances in Geographic Information Systems International Symposium, eds Makki Y, Pissinou N (Association for Computing Machinery, McLean, VA), pp 1–4.
 
- Anselin L, Florax R, Rey S, eds (2004) Advances in Spatial Econometrics: Methodology, Tools and Applications (Springer, Berlin). 
 
- Wang S, Armstrong M (2009) A theoretical approach to the use of cyberinfrastructure in geographical analysis. Int J Geogr Inf Sci 23:169–193. 
 
- Wang S (2010) A cyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Ann Assoc Am Geogr 100:535–557.
 
- Penninga F, Van Oosterom PJM (2008) A simplicial complex-based DBMS approach to 3D topographic data modelling. Int J Geogr Inf Sci 22:751–779. 
 
- Baker KS, Chandler CL (2008) Enabling long-term oceanographic research: Changing data practices, in- formation management strategies and informatics. Deep-Sea Res II 55(18–19):2132–2142.

I wanted to find out what the most common first author is in that long list of articles, and this is what I did:

cat refs-2009+.txt | \
sed -e '/^ *$/d' -e 's/^- //' | \
cut -d"(" -f1 | \
cut -d, -f1 | \
cut -d' ' -f1 | \
sort | \
uniq -c | \
sort -r > \
sorted-refs.txt

The result is this:

   6 Craglia
   4 Wang
   4 Rajabifard
   4 Onsrud
   4 Masser
   4 Grus
   4 Crompvoets
   3 Yang
   3 Steiniger
   3 Gartner
   3 European
   3 Anselin
   2 Wright
   2 Smits
   2 Sieber
   2 Ramsey
   2 Poore
   2 Miller
   2 Lance
   2 INSPIRE
   2 Helly
   2 Georgiadou
   2 Fox
   2 Foster
   2 Bregt
   1 Zhang
   1 World
...

How to create JSON data from a text file on the internet

The following assumes a linux command line to be present (or Mac OS X terminal in my case).

I want to wrangle text from the internet, turn it into JSON data, and ultimately stick it in CouchDB. Here I’m trying to turn a random text file containing prime numbers into structured JSON data that looks like this:

[2, 3, 5, 7,...]

The original file is here: http://primes.utm.edu/lists/small/1000.txt. It is fairly structured to begin with, but it’s not JSON.

                         The First 1,000 Primes
                          (the 1,000th is 7919)
         For more information on primes see http://primes.utm.edu/

      2      3      5      7     11     13     17     19     23     29 
     31     37     41     43     47     53     59     61     67     71 
     73     79     83     89     97    101    103    107    109    113 
...
end.

The following line does turns it into JSON:

curl http://primes.utm.edu/lists/small/1000.txt | \
tail +4 | \
tr -cs "[:digit:]" "," | \
sed -e 's/^,/\[/' -e 's/,$/\]/' \
> primes.json

Let’s look at it with cat to make sure:

$ cat primes.json
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,...

Explanation of the command

curl is used to download the file and print it on standard output in the terminal. With no arguments it issues a HTTP GET for http://primes.utm.edu/lists/small/1000.txt.

tail +4 discards the first four lines.

tr -cs "[:digit:]" "," converts the text into digits followed by commas. The new text has a comma before the first digit, and a comma after the last one. No linebreaks or spaces: ,2,3,5,7...,7919,

sed -e 's/^,/\[/' -e 's/,$/\]/' is perhaps a bit hard to read. It replaces the comma before the first digit with '[', and replaces the comma after the last digit with ']'.

Using shp2geocouch to push OSM data into geocouch

Today I installed the utility shp2geocouch on Mac OS X 1.6.

First I needed to update RubyGems…

sudo gem update --system

Then I could install shp2geocouch

sudo gem install shp2geocouch

Next I downloaded OSM data for Copenhagen, Denmark

wget http://download.cloudmade.com/europe/northern_europe/denmark/copenhagen/copenhagen.shapefiles.zip
unzip copenhagen.shapefiles.zip
cd copenhagen.shapefiles

Finally I used shp2geocouch to upload one of the shape files to iriscouch.com (database gd.iriscouch.com/cphosm).

shp2geocouch europe_northern_europe_denmark_copenhagen_highway.shp gd.iriscouch.com/cphosm

This takes a while and the job is still running on my MacBook Pro (after ~10 minutes 16000 documents have been loaded into iriscouch.com). The final count was 33306 documents.

As a final touch, the script replicates geocouch-utils + map browser and tells me

view your data on a map at http://gd.iriscouch.com/cphosm/_design/geo/_rewrite

The map uses OSM tiles from cloudmade as background, and fetches clickable road data from iriscouch using XHR:

Clicking the link, gives you this: