Cosine similarity is the normalised dot product between two vectors. I guess it is called "cosine" similarity because the dot product is the product of Euclidean magnitudes of the two vectors and the cosine of the angle between them. If you want, read more about cosine similarity and dot products on Wikipedia.

Here is how to compute cosine similarity in Python, either manually (well, using numpy) or using a specialised library:

import numpy as np
from sklearn.metrics.pairwiseimport cosine_similarity
# vectors
a = np.array([1,2,3])
b = np.array([1,1,4])# manually compute cosine similarity
dot = np.dot(a, b)
norma = np.linalg.norm(a)
normb = np.linalg.norm(b)
cos = dot / (norma * normb)# use library, operates on sets of vectors
aa = a.reshape(1,3)
ba = b.reshape(1,3)
cos_lib = cosine_similarity(aa, ba)print(
dot,
norma,
normb,
cos,
cos_lib[0][0])

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# vectors
a = np.array([1,2,3])
b = np.array([1,1,4])
# manually compute cosine similarity
dot = np.dot(a, b)
norma = np.linalg.norm(a)
normb = np.linalg.norm(b)
cos = dot / (norma * normb)
# use library, operates on sets of vectors
aa = a.reshape(1,3)
ba = b.reshape(1,3)
cos_lib = cosine_similarity(aa, ba)
print(
dot,
norma,
normb,
cos,
cos_lib[0][0]
)

The values might differ a slight bit on the smaller decimals. On my computer I get:

Now that I work in shipping, it is necessary to learn a bunch of new terms. Shipping is regulated under Admiralty Law and there are traditional documents and parties involved. Knowing what these are is crucial to understanding shipping.

Legal documents

There are three key documents involved with shipping:

Here is how to sample from a softmax probability vector at different temperatures.

import numpy as np
import matplotlib.pyplotas plt
import matplotlib as mpl
import seaborn as sns
mpl.rcParams['figure.dpi']=144
trials =1000
softmax =[0.1,0.3,0.6]def sample(softmax, temperature):
EPSILON =10e-16# to avoid taking the log of zero#print(preds)(np.array(softmax) + EPSILON).astype('float64')
preds = np.log(softmax) / temperature
#print(preds)
exp_preds = np.exp(preds)#print(exp_preds)
preds = exp_preds / np.sum(exp_preds)#print(preds)
probas = np.random.multinomial(1, preds,1)return probas[0]
temperatures =[(t or1) / 100for t inrange(0,101,10)]
probas =[
np.asarray([sample(softmax, t)for _ inrange(trials)]).sum(axis=0) / trials
for t in temperatures
]
sns.set_style("darkgrid")
plt.plot(temperatures, probas)
plt.show()

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
mpl.rcParams['figure.dpi']= 144
trials = 1000
softmax = [0.1, 0.3, 0.6]
def sample(softmax, temperature):
EPSILON = 10e-16 # to avoid taking the log of zero
#print(preds)
(np.array(softmax) + EPSILON).astype('float64')
preds = np.log(softmax) / temperature
#print(preds)
exp_preds = np.exp(preds)
#print(exp_preds)
preds = exp_preds / np.sum(exp_preds)
#print(preds)
probas = np.random.multinomial(1, preds, 1)
return probas[0]
temperatures = [(t or 1) / 100 for t in range(0, 101, 10)]
probas = [
np.asarray([sample(softmax, t) for _ in range(trials)]).sum(axis=0) / trials
for t in temperatures
]
sns.set_style("darkgrid")
plt.plot(temperatures, probas)
plt.show()

Notice how the probabilities change at different temperatures. The softmax probabilities are [0.1, 0.3, 0.6]. At the lowest temperatures of 0.01, the dominant index (value 0.6) has near 100% probability of being sampled. At higher temperatures, the selection probabilities move towards the softmax values, e.g. 60% probability for the third index.

%matplotlib inline
import geopandas as gpd
import matplotlib as mpl # make rcParams available (optional)
mpl.rcParams['figure.dpi']=144# increase dpi (optional)
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world = world[world.name!='Antarctica']# remove Antarctica (optional)
world['gdp_per_person']= world.gdp_md_est / world.pop_est
g = world.plot(column='gdp_per_person', cmap='OrRd', scheme='quantiles')
g.set_facecolor('#A8C5DD')# make the ocean blue (optional)

%matplotlib inline
import geopandas as gpd
import matplotlib as mpl # make rcParams available (optional)
mpl.rcParams['figure.dpi']= 144 # increase dpi (optional)
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world = world[world.name != 'Antarctica'] # remove Antarctica (optional)
world['gdp_per_person'] = world.gdp_md_est / world.pop_est
g = world.plot(column='gdp_per_person', cmap='OrRd', scheme='quantiles')
g.set_facecolor('#A8C5DD') # make the ocean blue (optional)

I was looking at arial photos of north-western Europe in Google Maps when I noticed a big white dot on the map!

I thought, what the hell? To satisfy my curiosity I decided to zoom in for further investigation.

It turns out that the big white dot is a giant surface mine. The 48 km² mine is operated by RWE and used for mining lignite, also known as brown coal.

Fun fact: 50% of Greece's power supply and 27% of Germany's comes from burning lignite. Lignite also has innovative uses in farming and drilling.

Isn't the geometric juxtaposition of farmland, urban area and surface mine quite enchanting? To get a sense of the scale, take a look at the size of cars next to the big heavy machine; then try to find the big heavy machine on the zoomed out image.

Here is a video that displays the grotesque beauty of the place...

- Urban morphological zones 2000 (EU): https://www.eea.europa.eu/data-and-maps/data/urban-morphological-zones-2000-2
- Population count (World): http://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-count-rev10/
- Administrative regions (World): http://gadm.org/

The map is European since the "urban" data from the European Environmental Agency (EEA) only covers Europe.

Caveats

The UMZ data ended up in PostGIS with srid 900914. You can use prj2epsg.org to convert the contents of a .prj file to an estimated SRID code. In this case the UMZ .prj file as the contents:

The GADM database contains geographical data for administrative regions, e.g. countries, regions and municipalities. As always, once you have the data in the right format, it is easy to import it into a database. The data is available from GADM in several formats. All data has the coordinate reference system in longitude/latitude and theWGS84 datum.

Step-by-step:

Download data for the whole world or by country. For a change, I will use the GeoPackage format.

Create a PostgreSQL database (assumed to exist)

Import the data with ogr2ogr (see instructions below)

As a test, we can query the adm2 table (municipalities) with a coordinate inside the municipality of Copenhagen, Denmark.

SELECT name_2, ST_AsText(wkb_geometry)FROM dnk_adm2
WHERE ST_Intersects(ST_SetSRID(ST_Point(12.563585,55.690628),4326), wkb_geometry)-- AND ST_Point(12.563585, 55.690628) && wkb_geometry

SELECT name_2, ST_AsText(wkb_geometry)
FROM dnk_adm2
WHERE ST_Intersects(ST_SetSRID(ST_Point(12.563585, 55.690628), 4326), wkb_geometry)
-- AND ST_Point(12.563585, 55.690628) && wkb_geometry

You can view the selected well-known string geometry (WKT) in an online viewer, such as openstreetmap-wkt-playground. Other viewers are listed on stackexchange.

Alternative sources

For this post I really wanted a dataset of populated/urban areas. However, the GADM data I downloaded only contains adm0-adm2, which is a tessellation of the land area, i.e. cannot be used to discriminate between urban and rural areas.

From the rtwilson list, here are some specific datasets that indicate population density and urbanism:

- http://sedac.ciesin.columbia.edu/data/collection/gpw-v4/sets/browse
- https://www.eea.europa.eu/data-and-maps/data/urban-morphological-zones-2000-2
- http://www.worldpop.org.uk/ (does not cover Europe and North America)
- https://nordpil.com/resources/world-database-of-large-cities/

I teach children how to programm and do other things with technology in an organisation called Coding Pirates in Denmark, which aims to be a kind of scout movement for geeks. A best seller among the kids is learning how to hack and I see this as a unique opportunity to convey some basic human values in relation to something that can be potentially harmful.

Yesterday, I and one of the kids played with nmap, the network surveying tool, to investigate our local area network. The aim was to find information about the computers that were attached, such as operating system, system owner's first name (often part of the computer name) and whether any computer had open server ports (SSH, web etc.). We used nmap in combination with Wireshark.

Tell another person about a fun website (any website will do)

Use wireshark to detect the IP address (e.g. 192.168.85.116) of any computer that accesses that website

Use nmap to scan the IP address we found: nmap -vS 192.168.85.116

We also learned how to detect that someone logs into your computer and e.g. kick the person (assume an Ubuntu host):

# Monitor login attemptstail-f/var/log/auth.log
# See active sessionswho# List remote sessionsps fax |grep'pts/'# Kill sessionskill-9[pid of bash processes connected to session]

# Monitor login attempts
tail -f /var/log/auth.log
# See active sessions
who
# List remote sessions
ps fax | grep 'pts/'
# Kill sessions
kill -9 [pid of bash processes connected to session]

Other tricks

List all hosts (ping scan) on your local area network:

nmap-sP 192.168.1.*

nmap -sP 192.168.1.*

Find computers on your local area network that run an SSH server: