I have a text file containing article references. It looks like this
- Miller HJ (2004) Tobler’s First Law and spatial analysis. Ann Assoc Am Geogr 94:284–289. - Onsrud H, ed (2007) Research and Theory in Advanced Spatial Data Infrastructure Concepts (ESRI Press, Redlands, CA). - Egenhofer M (2002) Toward the geospatial semantic web. Advances in Geographic Information Systems International Symposium, eds Makki Y, Pissinou N (Association for Computing Machinery, McLean, VA), pp 1–4. - Anselin L, Florax R, Rey S, eds (2004) Advances in Spatial Econometrics: Methodology, Tools and Applications (Springer, Berlin). - Wang S, Armstrong M (2009) A theoretical approach to the use of cyberinfrastructure in geographical analysis. Int J Geogr Inf Sci 23:169–193. - Wang S (2010) A cyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Ann Assoc Am Geogr 100:535–557. - Penninga F, Van Oosterom PJM (2008) A simplicial complex-based DBMS approach to 3D topographic data modelling. Int J Geogr Inf Sci 22:751–779. - Baker KS, Chandler CL (2008) Enabling long-term oceanographic research: Changing data practices, in- formation management strategies and informatics. Deep-Sea Res II 55(18–19):2132–2142. |
I wanted to find out what the most common first author is in that long list of articles, and this is what I did:
cat refs-2009+.txt | \ sed -e '/^ *$/d' -e 's/^- //' | \ cut -d"(" -f1 | \ cut -d, -f1 | \ cut -d' ' -f1 | \ sort | \ uniq -c | \ sort -r > \ sorted-refs.txt |
The result is this:
6 Craglia 4 Wang 4 Rajabifard 4 Onsrud 4 Masser 4 Grus 4 Crompvoets 3 Yang 3 Steiniger 3 Gartner 3 European 3 Anselin 2 Wright 2 Smits 2 Sieber 2 Ramsey 2 Poore 2 Miller 2 Lance 2 INSPIRE 2 Helly 2 Georgiadou 2 Fox 2 Foster 2 Bregt 1 Zhang 1 World ... |