Category: Spatial stuff

  • How to display a Choropleth map in Jupyter Notebook

    Here is the code:

    %matplotlib inline
    import geopandas as gpd
    import matplotlib as mpl  # make rcParams available (optional)
    mpl.rcParams['figure.dpi']= 144  # increase dpi (optional)
    
    world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
    world = world[world.name != 'Antarctica']  # remove Antarctica (optional)
    world['gdp_per_person'] = world.gdp_md_est / world.pop_est
    g = world.plot(column='gdp_per_person', cmap='OrRd', scheme='quantiles')
    g.set_facecolor('#A8C5DD')  # make the ocean blue (optional)
    

    Here is what the map looks like:

    Dependencies:

    pip install matplotlib
    pip install geopandas
    pip install pysal  # for scheme option
    
  • Python script for geocoding a text file

    Assume that you have a file with some locations as text with one location per line.

    For example, here are some school names in Copenhagen, Denmark, stored in schools.csv:

    Hyltebjerg Skole
    Heibergskolen
    Ellebjerg Skole
    Katrinedals Skole
    Peder Lykke Skolen
    Amager Fælled Skole
    Tingbjerg Heldagsskole
    Øster Farimagsgades Skole
    Sankt Annæ Gymnasiums Grundskole
    Lykkebo Skole
    Randersgades Skole
    Strandvejsskolen
    Sortedamskolen
    Grøndalsvængets Skole
    Sølvgades Skole
    Skolen ved Sundet
    Hanssted Skole
    Holbergskolen
    Den Classenske Legatskole
    Tove Ditlevsens Skole
    Lergravsparkens Skole
    Vigerslev Allés Skole
    Bavnehøj Skole
    Ålholm Skole
    Langelinieskolen
    Guldberg Skole
    Husum Skole
    Nyboder Skole
    Vanløse Skole
    Kirkebjerg Skole
    Christianshavns Skole
    Bellahøj Skole
    Kildevældsskolen
    Korsager Skole
    Nørrebro Park Skole
    Utterslev Skole
    Skolen på Islands Brygge
    Brønshøj Skole
    Kirsebærhavens Skole
    Rødkilde Skole
    Vesterbro Ny Skole
    Blågård Skole
    Sønderbro Skole
    Højdevangens Skole
    Oehlenschlægersgades Skole
    Vibenshus Skole
    Valby Skole
    Rådmandsgades Skole
    Lundehusskolen
    Tagensbo Skole
    

    Here is a script, geocode.py, that will attempt to geocode each location in an input stream. It prints CSV output to stdout with the fields input_line, input_line_no, result_no, place, latitude, longitude:

    from geopy import geocoders
    import sys
    import time
    import pdb
    
    geocoder = geocoders.GoogleV3()
    
    SEPARATOR='|'  # can also use tab. Comma is bad, since the place will most likely contain a comma.
    dummy = ['', ['', '']]
    i = 0
    
    header = ['input_line', 'input_line_no', 'result_no', 'place', 'latitude', 'longitude']
    print(SEPARATOR.join(header))
    
    for line in sys.stdin:
        line = line.strip()
        results = geocoder.geocode(line, exactly_one=False) or [dummy]
        for j, res in enumerate(results):
            place = res[0]
            lat = str(res[1][0])
            lon = str(res[1][1])
            out = SEPARATOR.join([line, str(i), str(j), place, lat, lon])
            print (out)
        time.sleep(0.05)
        i += 1
    

    Here is how you might use the script:

    cat schools.csv | python geocode.py 
    

    Tip: you might want to

  • How to work with spatial data in Amazon Redshift

    While Redshift does not offer native support for spatial data, indexes and functions, there exists a partial workaround. Redshift supports Python UDFs and can also load custom Python libraries. Out of the box, Redshift has numpy, scipy, pandas and many other useful Python libraries. For spatial functionality, one saving grace is the high quality spatial libraries that exist for Python, such as shapely. Of course, the alternative is to simply implement useful spatial functions in Python directly, which we will do here. The drawback is that this does not provide the means for spatial indexes or native spatial types in Redshift. As long as you are working mainly with point data, this should not be a huge obstacle. While polygons and operations on them are useful in many cases, a properly utilized GeoHash can usually do the trick.

    So, let’s get into it! Connect to your Redshift cluster using a client of your choosing. I prefer SQLWorkbench/J. Properly connected, attempt to create the following UDF in Python, which implements the haversine formula using NumPy (thanks to jterrace for the solution).

    CREATE OR REPLACE FUNCTION haversine (lat1 float, lon1 float, lat2 float, lon2 float)
    RETURNS float IMMUTABLE AS
    $$
    
        from math import radians, sin, cos, asin, sqrt, pi, atan2
        import numpy as np
    
        earth_radius_miles = 3956.0
    
        def haversine(lat1, lon1, lat2, lon2):
            """Gives the distance between two points on earth.
            """
            lat1, lon1 = radians(lat1), radians(lon1)
            lat2, lon2 = radians(lat2), radians(lon2)
            dlat, dlon = (lat2 - lat1, lon2 - lon1)
            a = sin(dlat/2.0)**2 + cos(lat1) * cos(lat2) * sin(dlon/2.0)**2
            great_circle_distance = 2 * asin(min(1,sqrt(a)))
            return earth_radius_miles * great_circle_distance
     
        return haversine(lat1, lon1, lat2, lon2)
    $$ LANGUAGE plpythonu;
    

    Now, let’s use our new UDF to calculate the great-circle distance between a pair of points.

    SELECT haversine(37.160316546736745, -78.75, 39.095962936305476, -121.2890625)
    -- 2293.1324218790523
    

    One very big drawback is that it is incredibly slow (an understatement). The following query computes the function just 100 times, which on my cluster took over 17.21 seconds (jeez!):

    SELECT count(haversine(37.160316546736745, -78.75, 39.095962936305476, lon2 % 360 - 180)) FROM generate_series(1, 100) lon2
    

    Because the speed is so slow, I will investigate another way to achieve this goal with Redshift. Expect updates to this post.

  • Geocoding Python function for PostgreSQL

    Gratefully making use of what others have provided, i.e. geopy, Google and plpythonu.

    Type to hold result of geocoding:

    CREATE TYPE geocoding AS (
      place text,
      latitude double precision,
      longitude double precision
    );
    

    Function that does the actual geocoding (to be extended with more vendors. Hint: look at geopy wiki). Takes an (arbitrary) input string to be geocoded:

    CREATE OR REPLACE FUNCTION python_geocode
    (
      input text,
      vendor text DEFAULT 'google'
    ) RETURNS SETOF geocoding AS
    $$
      import time
      from geopy import geocoders
      # https://code.google.com/p/geopy/wiki/GettingStarted
    
      time.sleep(0.2)
      # TODO: Add other available vendors, e.g. Yahoo.
      if vendor.lower() == 'google':
        geocoder = geocoders.GoogleV3()
      else:
        raise ValueError("Invalid geocoder: %s" % vendor)
      try:
        for res in geocoder.geocode(input, exactly_one=False):
          yield {'place': res[0], 'latitude': res[1][0], 'longitude': res[1][1]}
      except:
        pass
    $$ LANGUAGE plpythonu VOLATILE;
    

    Example:

    SELECT place, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)
    FROM python_geocode('Kostas');
    
  • The new Google Maps: Explorers delight!

    Being interested in maps in general, and specifically vector data, of course I had to take the new Google Maps for a spin. Vector data aside, what fascinated me most was the option to display photos of the current location in the bottom of the screen. Seeing actual photos is incredibly more useful than being shown 100 identical icons indicating that someone took a photo of something here.

    (more…)

  • List of cloud GIS platforms

    This is work in progress…

    http://geocommons.com/
    http://www.giscloud.com/
    http://cartodb.com/
    http://www.google.com/drive/start/apps.html#fusiontables
    http://tables.googlelabs.com/

    CloudMade Landing


    http://mapbox.com/
    http://mapquest.com
    http://maps.google.com

  • Downloading environment data for Denmark

    This post is very Denmark-specific. I was annoyed that downloading environmental data from the ministry of the environment was so cumbersome.

    Long story made short, here are download URLs for environmental data:

    Dataset URL
    Kirkebyggelinjer KIRKEBYGGELINJER_SHAPE.zip
    Skovbyggelinjer SKOVBYGGELINJER_SHAPE.zip
    Åbeskyttelseslinjer AA_BES_LINJER_SHAPE.zip
    Søbeskyttelseslinjer SOE_BES_LINJER_SHAPE.zip
    EF Fuglebeskyttelsesområder EF_FUGLE_BES_OMR_SHAPE.zip
    EF Habitatområder EF_HABITAT_OMR_SHAPE.zip
    RAMSAR-områder RAMSAR_OMR_SHAPE.zip
    Naturvildtreservat NATUR_VILDT_RESERVAT_SHAPE.zip

    There are more. I’ll add them when I have the time. Why there ministry homepage does not provide direct links like this is beyond me.

  • Trying a Python R-tree implementation

    Rtree is a ctypes Python wrapper of libspatialindex that provides a number of advanced spatial indexing features for the spatially curious Python user.

    (more…)

  • A presentation on spatial indexing

    A friend of mine, who is the CEO of a company that develops an embedded database, asked me to do a presentation on spatial indexing. This was an opportunity for me to brush up on R-trees and similar datastructures.

    Download the slides

    The presentation introduces R-trees and spatial indexing to a technical audience, who are however not spatial indexing experts.

  • Having a look at Spacebase

    Spacebase is a spatial datastore that began life as military-grade software, which at least sounds kind of cool. It’s an in-memory database, really, so switch off the cluster and the data is gone. Apparently the same thing was (unknown to the SpaceBase people?) invented in the 90’s by some americans also having the military as their first customer.

    (more…)