Cosine similarity in Python

Cosine similarity is the normalised dot product between two vectors. I guess it is called “cosine” similarity because the dot product is the product of Euclidean magnitudes of the two vectors and the cosine of the angle between them. If you want, read more about cosine similarity and dot products on Wikipedia.

Here is how to compute cosine similarity in Python, either manually (well, using numpy) or using a specialised library:

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
 
# vectors
a = np.array([1,2,3])
b = np.array([1,1,4])
 
# manually compute cosine similarity
dot = np.dot(a, b)
norma = np.linalg.norm(a)
normb = np.linalg.norm(b)
cos = dot / (norma * normb)
 
# use library, operates on sets of vectors
aa = a.reshape(1,3)
ba = b.reshape(1,3)
cos_lib = cosine_similarity(aa, ba)
 
print(
    dot,
    norma,
    normb,
    cos,
    cos_lib[0][0]
)

The values might differ a slight bit on the smaller decimals. On my computer I get:

  • 0.9449111825230682 (manual)
  • 0.9449111825230683 (library)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.