I have a text file containing article references. It looks like this
- Miller HJ (2004) Tobler’s First Law and spatial analysis. Ann Assoc Am Geogr 94:284–289.
- Onsrud H, ed (2007) Research and Theory in Advanced Spatial Data Infrastructure Concepts (ESRI Press, Redlands, CA).
- Egenhofer M (2002) Toward the geospatial semantic web. Advances in Geographic Information Systems International Symposium, eds Makki Y, Pissinou N (Association for Computing Machinery, McLean, VA), pp 1–4.
- Anselin L, Florax R, Rey S, eds (2004) Advances in Spatial Econometrics: Methodology, Tools and Applications (Springer, Berlin).
- Wang S, Armstrong M (2009) A theoretical approach to the use of cyberinfrastructure in geographical analysis. Int J Geogr Inf Sci 23:169–193.
- Wang S (2010) A cyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Ann Assoc Am Geogr 100:535–557.
- Penninga F, Van Oosterom PJM (2008) A simplicial complex-based DBMS approach to 3D topographic data modelling. Int J Geogr Inf Sci 22:751–779.
- Baker KS, Chandler CL (2008) Enabling long-term oceanographic research: Changing data practices, in- formation management strategies and informatics. Deep-Sea Res II 55(18–19):2132–2142.
I wanted to find out what the most common first author is in that long list of articles, and this is what I did:
cat refs-2009+.txt | \
sed -e '/^ *$/d' -e 's/^- //' | \
cut -d"(" -f1 | \
cut -d, -f1 | \
cut -d' ' -f1 | \
sort | \
uniq -c | \
sort -r > \
sorted-refs.txt
The result is this:
6 Craglia
4 Wang
4 Rajabifard
4 Onsrud
4 Masser
4 Grus
4 Crompvoets
3 Yang
3 Steiniger
3 Gartner
3 European
3 Anselin
2 Wright
2 Smits
2 Sieber
2 Ramsey
2 Poore
2 Miller
2 Lance
2 INSPIRE
2 Helly
2 Georgiadou
2 Fox
2 Foster
2 Bregt
1 Zhang
1 World
...