Creating a Recommender System with PyTorch for Collaborative Filtering

Posted by


Recommender systems are widely used in e-commerce, social media, and other platforms to provide personalized recommendations to users based on their preferences and behavior. In this tutorial, we will learn how to build a recommender system using PyTorch with collaborative filtering.

Collaborative filtering is a common approach to building recommender systems that leverages the behavior of users and items to make recommendations. It can be further divided into two types: user-based collaborative filtering and item-based collaborative filtering. In this tutorial, we will focus on user-based collaborative filtering.

To build a user-based collaborative filtering recommender system, we will use the MovieLens dataset, which contains movie ratings from users. We will use PyTorch to implement the collaborative filtering algorithm and train a model to make movie recommendations to users.

Step 1: Loading the Data
First, we need to load the MovieLens dataset into a Pandas DataFrame. You can download the dataset from here: https://grouplens.org/datasets/movielens/latest/. Make sure to download the ml-latest-small.zip file.

import pandas as pd

# Load the ratings data
ratings = pd.read_csv('ml-latest-small/ratings.csv')

# Display the first few rows of the ratings data
print(ratings.head())

Step 2: Preparing the Data
Next, we need to prepare the data for training the model. We will create a user-item matrix where rows represent users, columns represent items (movies), and values represent ratings.

# Create a user-item matrix
user_item_matrix = ratings.pivot(index='userId', columns='movieId', values='rating').fillna(0)

Step 3: Building the Model
Now, we will define a collaborative filtering model using PyTorch. We will use a simple matrix factorization model with embedding layers for users and items.

import torch
import torch.nn as nn
import torch.optim as optim

class CollaborativeFiltering(nn.Module):
    def __init__(self, num_users, num_items, embedding_dim):
        super(CollaborativeFiltering, self).__init__()
        self.user_embedding = nn.Embedding(num_users, embedding_dim)
        self.item_embedding = nn.Embedding(num_items, embedding_dim)

    def forward(self, user_ids, item_ids):
        user_embeds = self.user_embedding(user_ids)
        item_embeds = self.item_embedding(item_ids)
        return torch.sum(user_embeds * item_embeds, dim=1)

Step 4: Training the Model
Now, we will train the collaborative filtering model using the user-item matrix we created earlier. We will use PyTorch’s DataLoader class to create batches of data for training.

from torch.utils.data import DataLoader, TensorDataset

# Convert the user-item matrix to PyTorch tensors
user_ids = torch.LongTensor(user_item_matrix.index.values)
item_ids = torch.LongTensor(user_item_matrix.columns.values)
ratings = torch.FloatTensor(user_item_matrix.values)

# Create a DataLoader for training
dataset = TensorDataset(user_ids, item_ids, ratings)
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

# Initialize the model and optimizer
model = CollaborativeFiltering(num_users=len(user_item_matrix), num_items=len(user_item_matrix.columns), embedding_dim=50)
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
for epoch in range(10):
    for user_ids_batch, item_ids_batch, ratings_batch in dataloader:
        ratings_pred = model(user_ids_batch, item_ids_batch)
        loss = nn.MSELoss()(ratings_pred, ratings_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f'Epoch {epoch+1}, Loss: {loss.item()}')

Step 5: Making Recommendations
Finally, we can make movie recommendations to users using the trained model. We will get the embeddings of users and items and compute the dot product to get the predicted ratings for all movies. We can then recommend the top N movies with the highest predicted ratings.

user_embeddings = model.user_embedding.weight.detach()
item_embeddings = model.item_embedding.weight.detach()

def recommend_movies(user_id, top_n=10):
    user_embed = user_embeddings[user_id]
    ratings_pred = torch.matmul(user_embed, item_embeddings.T)
    top_indices = ratings_pred.argsort(descending=True)[:top_n]
    top_movie_ids = item_ids[top_indices].numpy()
    top_movies = ratings.columns[top_movie_ids].tolist()
    return top_movies

# Make recommendations for user 100
print(recommend_movies(100))

That’s it! In this tutorial, we learned how to build a collaborative filtering recommender system using PyTorch. Collaborative filtering is a powerful technique for making personalized recommendations, and PyTorch makes it easy to implement and train models for recommendation systems. Feel free to experiment with different hyperparameters, model architectures, and datasets to improve the performance of your recommender system.

0 0 votes
Article Rating

Leave a Reply

16 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@the-ai-alchemy
3 hours ago

Thank you for supporting this video. If you are interested in the supplementary materials (slides and notebook) in the video, please visit: https://open.substack.com/pub/aialchemyofficial/p/ai-alchemy-supplementary-materials

@spicytuna08
3 hours ago

do you have github to access the code?

@utubesiddhant
3 hours ago

Awesome video! It provides a great intro to collaborative filtering using PyTorch. One question – the model in the video doesn't take the dot product of user and movie embedding right? Instead, it predicts movie ratings using a linear combination of both the embeddings using weights, with embeddings and weights evolving as the model learns after each iteration. Is my understanding correct or am I missing something?

@nobodycanfindme314
3 hours ago

Great video

@XxXx-sc3xu
3 hours ago

Awesome video! Are the slides available for download? Loved the format, thank you.

@vikramsudarshan6416
3 hours ago

Hello, the iter(train_loader) and dataiter.next( ) steps appear to be very slow on my end – is this normal? Does it always take time to load the data-set in batches?

@christianreizner2546
3 hours ago

Why you choose 32 as output size for the embeddings? Is this an arbitrary choice?

@vinothkannan5091
3 hours ago

If we use length of unique users and items for embedding, what happens if a new user/item added in the future? how the trained model handle this case

@TheJosephjeffy
3 hours ago

Thank you for your video! I want to ask where I can embed the user characteristics into the model for prediction? The data you use did not consist of that information.

@stevekelly5381
3 hours ago

Please explain how you derived the User * Items Co-occurence Matrix from the User Matrix * Item Matrix. I dont understand where the values come from. The first cell is 3.16 you leave it blank. the second cell is 1.92 you put 4.5?! Why?

@736939
3 hours ago

So why don't you use any activation function in the model? And how to use it in production after training? – should we just get the dot product from both tuned embeddied layers to get predicted result?

@hamnaka3355
3 hours ago

Sir can you explain the output?

@البداية-ذ1ذ
3 hours ago

Thks for sharing ,could you please explaine to how you creat latent vector or embadding for user and item .I tried to repeat your video but i could not get .

@FromNamoR3k
3 hours ago

Great video with clear explanation, I think watching it with zoom-in/fullscreen would make it even better

@Debz_panache
3 hours ago

Perfect! It's executable too. Thanks

@cailin9601
3 hours ago

Great video with excellent information – keep it up! 😊

16
0
Would love your thoughts, please comment.x
()
x