Creating a Semantic Search Engine using Sentence Transformers in Python

Posted by


In this tutorial, we will be building a semantic search engine using Sentence Transformers in Python. Sentence Transformers is a library that allows us to encode sentences into high-dimensional vectors, which can then be used for semantic similarity tasks such as searching for similar sentences.

To get started, first make sure you have Python installed on your system. You can download Python from the official website and install it following the instructions provided.

Next, you will need to install the Sentence Transformers library. You can do this by running the following command in your terminal:

pip install sentence-transformers

Once the library is installed, we can start building our semantic search engine.

Step 1: Load a pre-trained Sentence Transformer model

First, we need to load a pre-trained Sentence Transformer model that has been trained on a large corpus of text data. These pre-trained models are available from the Sentence Transformers library, and we can load them using the SentenceTransformer class. For this tutorial, we will be using the distilbert-base-nli-stsb-mean-tokens model, which has been trained on the STS-B dataset for semantic textual similarity tasks.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')

Step 2: Encode your corpus of text data

Next, we need to encode our corpus of text data using the Sentence Transformer model that we have loaded. This will convert each sentence into a high-dimensional vector representation that captures its semantic meaning.

corpus = [
    "The quick brown fox jumps over the lazy dog.",
    "A stitch in time saves nine.",
    "Actions speak louder than words."
]

corpus_embeddings = model.encode(corpus)

Step 3: Create a function to search for similar sentences

Now that we have encoded our corpus of text data, we can create a function that takes a query sentence as input and returns the most similar sentences from our corpus.

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def semantic_search(query, corpus_embeddings, model, top_n=5):
    query_embedding = model.encode([query])[0]

    distances = cosine_similarity([query_embedding], corpus_embeddings)[0]
    indices = np.argsort(distances)[::-1]

    return [(corpus[i], distances[i]) for i in indices[:top_n]]

Step 4: Test the search function

Finally, we can test our semantic search function by providing a query sentence and seeing which sentences from our corpus are most similar to it.

query = "The lazy dog jumps over the quick brown fox."
results = semantic_search(query, corpus_embeddings, model)

for result in results:
    print(result)

That’s it! You have successfully built a semantic search engine using Sentence Transformers in Python. Feel free to experiment with different pre-trained models and datasets to see how well your search engine performs on other text data.

0 0 votes
Article Rating

Leave a Reply

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@a2zfun181
13 days ago

I f@king love u.. 😂😂😂
Acha hua kisi clg m bche sabke saamne ase results aya

@salamullahkhan7639
13 days ago

Thank you so much for coming up with such trending and in-demand topics.

@ZeeshanYounas-m5v
13 days ago

Allah pak apko or kmayab kry sir ge

3
0
Would love your thoughts, please comment.x
()
x