Implementing Cosine Similarity in Python: A Step-by-Step Guide

Posted by

Cosine Similarity: From Theory to Python Implementation

Cosine Similarity: From Theory to Python Implementation

Cosine similarity is a metric used to determine how similar two vectors are in a given space. It is commonly used in natural language processing, information retrieval, and machine learning. The cosine similarity between two vectors is calculated by measuring the cosine of the angle between them.

Theory of Cosine Similarity:

In theory, the cosine similarity between two vectors is calculated using the following formula:

    cos(theta) = (A . B) / (||A|| * ||B||)
    

Where A and B are the two vectors, “.” denotes the dot product, and “||.||” denotes the magnitude of the vectors.

Python Implementation:

Now, let’s see how we can implement cosine similarity in Python:

import numpy as np

def cosine_similarity(A, B):
    dot_product = np.dot(A, B)
    norm_A = np.linalg.norm(A)
    norm_B = np.linalg.norm(B)
    
    return dot_product / (norm_A * norm_B)

# Example Usage
vector_A = np.array([1, 2, 3])
vector_B = np.array([4, 5, 6])

similarity = cosine_similarity(vector_A, vector_B)
print("Cosine Similarity:", similarity)
    

In this Python implementation, we first calculate the dot product of the two vectors using NumPy’s dot() function. Then, we calculate the magnitudes of each vector using np.linalg.norm(). Finally, we return the cosine similarity between the two vectors.

As shown above, you can easily calculate the cosine similarity between two vectors in Python using NumPy. This can be useful in various machine learning applications where measuring similarity between vectors is important.