Finding the most quoted main author using linux command line

I have a text file containing article references. It looks like this

- Miller HJ (2004) Tobler’s First Law and spatial analysis. Ann Assoc Am Geogr 94:284–289.

- Onsrud H, ed (2007) Research and Theory in Advanced Spatial Data Infrastructure Concepts (ESRI Press, Redlands, CA).

- Egenhofer M (2002) Toward the geospatial semantic web. Advances in Geographic Information Systems International Symposium, eds Makki Y, Pissinou N (Association for Computing Machinery, McLean, VA), pp 1–4.

- Anselin L, Florax R, Rey S, eds (2004) Advances in Spatial Econometrics: Methodology, Tools and Applications (Springer, Berlin). 

- Wang S, Armstrong M (2009) A theoretical approach to the use of cyberinfrastructure in geographical analysis. Int J Geogr Inf Sci 23:169–193. 

- Wang S (2010) A cyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Ann Assoc Am Geogr 100:535–557.

- Penninga F, Van Oosterom PJM (2008) A simplicial complex-based DBMS approach to 3D topographic data modelling. Int J Geogr Inf Sci 22:751–779. 

- Baker KS, Chandler CL (2008) Enabling long-term oceanographic research: Changing data practices, in- formation management strategies and informatics. Deep-Sea Res II 55(18–19):2132–2142.

I wanted to find out what the most common first author is in that long list of articles, and this is what I did:

cat refs-2009+.txt | \
sed -e '/^ *$/d' -e 's/^- //' | \
cut -d"(" -f1 | \
cut -d, -f1 | \
cut -d' ' -f1 | \
sort | \
uniq -c | \
sort -r > \
sorted-refs.txt

The result is this:

   6 Craglia
   4 Wang
   4 Rajabifard
   4 Onsrud
   4 Masser
   4 Grus
   4 Crompvoets
   3 Yang
   3 Steiniger
   3 Gartner
   3 European
   3 Anselin
   2 Wright
   2 Smits
   2 Sieber
   2 Ramsey
   2 Poore
   2 Miller
   2 Lance
   2 INSPIRE
   2 Helly
   2 Georgiadou
   2 Fox
   2 Foster
   2 Bregt
   1 Zhang
   1 World
...

Leave a ReplyCancel Reply