In this tutorial, we will be building a semantic search engine using Sentence Transformers in Python. Sentence Transformers is a library that allows us to encode sentences into high-dimensional vectors, which can then be used for semantic similarity tasks such as searching for similar sentences.
To get started, first make sure you have Python installed on your system. You can download Python from the official website and install it following the instructions provided.
Next, you will need to install the Sentence Transformers library. You can do this by running the following command in your terminal:
pip install sentence-transformers
Once the library is installed, we can start building our semantic search engine.
Step 1: Load a pre-trained Sentence Transformer model
First, we need to load a pre-trained Sentence Transformer model that has been trained on a large corpus of text data. These pre-trained models are available from the Sentence Transformers library, and we can load them using the SentenceTransformer
class. For this tutorial, we will be using the distilbert-base-nli-stsb-mean-tokens
model, which has been trained on the STS-B dataset for semantic textual similarity tasks.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')
Step 2: Encode your corpus of text data
Next, we need to encode our corpus of text data using the Sentence Transformer model that we have loaded. This will convert each sentence into a high-dimensional vector representation that captures its semantic meaning.
corpus = [
"The quick brown fox jumps over the lazy dog.",
"A stitch in time saves nine.",
"Actions speak louder than words."
]
corpus_embeddings = model.encode(corpus)
Step 3: Create a function to search for similar sentences
Now that we have encoded our corpus of text data, we can create a function that takes a query sentence as input and returns the most similar sentences from our corpus.
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
def semantic_search(query, corpus_embeddings, model, top_n=5):
query_embedding = model.encode([query])[0]
distances = cosine_similarity([query_embedding], corpus_embeddings)[0]
indices = np.argsort(distances)[::-1]
return [(corpus[i], distances[i]) for i in indices[:top_n]]
Step 4: Test the search function
Finally, we can test our semantic search function by providing a query sentence and seeing which sentences from our corpus are most similar to it.
query = "The lazy dog jumps over the quick brown fox."
results = semantic_search(query, corpus_embeddings, model)
for result in results:
print(result)
That’s it! You have successfully built a semantic search engine using Sentence Transformers in Python. Feel free to experiment with different pre-trained models and datasets to see how well your search engine performs on other text data.
I f@king love u.. 😂😂😂
Acha hua kisi clg m bche sabke saamne ase results aya
Thank you so much for coming up with such trending and in-demand topics.
Allah pak apko or kmayab kry sir ge