Assume that you have a file with some locations as text with one location per line.
For example, here are some school names in Copenhagen, Denmark, stored in schools.csv
:
Hyltebjerg Skole Heibergskolen Ellebjerg Skole Katrinedals Skole Peder Lykke Skolen Amager Fælled Skole Tingbjerg Heldagsskole Øster Farimagsgades Skole Sankt Annæ Gymnasiums Grundskole Lykkebo Skole Randersgades Skole Strandvejsskolen Sortedamskolen Grøndalsvængets Skole Sølvgades Skole Skolen ved Sundet Hanssted Skole Holbergskolen Den Classenske Legatskole Tove Ditlevsens Skole Lergravsparkens Skole Vigerslev Allés Skole Bavnehøj Skole Ålholm Skole Langelinieskolen Guldberg Skole Husum Skole Nyboder Skole Vanløse Skole Kirkebjerg Skole Christianshavns Skole Bellahøj Skole Kildevældsskolen Korsager Skole Nørrebro Park Skole Utterslev Skole Skolen på Islands Brygge Brønshøj Skole Kirsebærhavens Skole Rødkilde Skole Vesterbro Ny Skole Blågård Skole Sønderbro Skole Højdevangens Skole Oehlenschlægersgades Skole Vibenshus Skole Valby Skole Rådmandsgades Skole Lundehusskolen Tagensbo Skole
Here is a script, geocode.py
, that will attempt to geocode each location in an input stream. It prints CSV output to stdout with the fields input_line, input_line_no, result_no, place, latitude, longitude:
from geopy import geocoders import sys import time import pdb geocoder = geocoders.GoogleV3() SEPARATOR='|' # can also use tab. Comma is bad, since the place will most likely contain a comma. dummy = ['', ['', '']] i = 0 header = ['input_line', 'input_line_no', 'result_no', 'place', 'latitude', 'longitude'] print(SEPARATOR.join(header)) for line in sys.stdin: line = line.strip() results = geocoder.geocode(line, exactly_one=False) or [dummy] for j, res in enumerate(results): place = res[0] lat = str(res[1][0]) lon = str(res[1][1]) out = SEPARATOR.join([line, str(i), str(j), place, lat, lon]) print (out) time.sleep(0.05) i += 1 |
Here is how you might use the script:
cat schools.csv | python geocode.py
Tip: you might want to