Cosine similarity is the normalised dot product between two vectors. I guess it is called “cosine” similarity because the dot product is the product of Euclidean magnitudes of the two vectors and the cosine of the angle between them. If you want, read more about cosine similarity and dot products on Wikipedia.
Here is how to compute cosine similarity in Python, either manually (well, using numpy) or using a specialised library:
import numpy as np from sklearn.metrics.pairwise import cosine_similarity # vectors a = np.array([1,2,3]) b = np.array([1,1,4]) # manually compute cosine similarity dot = np.dot(a, b) norma = np.linalg.norm(a) normb = np.linalg.norm(b) cos = dot / (norma * normb) # use library, operates on sets of vectors aa = a.reshape(1,3) ba = b.reshape(1,3) cos_lib = cosine_similarity(aa, ba) print( dot, norma, normb, cos, cos_lib )
The values might differ a slight bit on the smaller decimals. On my computer I get:
- 0.9449111825230682 (manual)
- 0.9449111825230683 (library)