In this tutorial, we will walk through the process of creating a sentiment classifier using a Recurrent Neural Network (RNN) in PyTorch. We will be using the IMDB movie reviews dataset for this task, where our model will learn to predict the sentiment of movie reviews as positive or negative.
The RNN architecture is well-suited for sequential data like text because it can capture dependencies between words in a sentence. We will use a simple RNN architecture with an embedding layer, an RNN layer, and a fully connected layer for classification.
Let’s get started by importing the necessary libraries:
import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.legacy import data
from torchtext.legacy import datasets
Next, we will define the batch size and seed for reproducibility:
BATCH_SIZE = 64
SEED = 1234
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
Now, let’s load the IMDB dataset and define the fields for our text and label:
TEXT = data.Field(tokenize='spacy', lower=True)
LABEL = data.LabelField(dtype=torch.float)
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)
We will split the training data into training and validation sets and build the vocabulary:
train_data, valid_data = train_data.split(random_state=random.seed(SEED))
TEXT.build_vocab(train_data, max_size=25000, vectors="glove.6B.100d", unk_init=torch.Tensor.normal_)
LABEL.build_vocab(train_data)
Next, we will create the iterators for the training, validation, and test sets:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
(train_data, valid_data, test_data),
batch_size=BATCH_SIZE,
device=device)
Now, let’s define our RNN model:
class RNN(nn.Module):
def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
super().__init__()
self.embedding = nn.Embedding(input_dim, embedding_dim)
self.rnn = nn.RNN(embedding_dim, hidden_dim)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, text):
embedded = self.embedding(text)
output, hidden = self.rnn(embedded)
assert torch.equal(output[-1,:,:], hidden.squeeze(0))
return self.fc(hidden.squeeze(0))
Now, let’s instantiate the model and define the hyperparameters:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM).to(device)
optimizer = optim.SGD(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()
Next, we will define functions for training and evaluation:
def train(model, iterator, optimizer, criterion):
model.train()
epoch_loss = 0
for batch in iterator:
optimizer.zero_grad()
predictions = model(batch.text).squeeze(1)
loss = criterion(predictions, batch.label)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
return epoch_loss / len(iterator)
def evaluate(model, iterator, criterion):
model.eval()
epoch_loss = 0
with torch.no_grad():
for batch in iterator:
predictions = model(batch.text).squeeze(1)
loss = criterion(predictions, batch.label)
epoch_loss += loss.item()
return epoch_loss / len(iterator)
Now, let’s train the model and evaluate it on the validation set:
N_EPOCHS = 5
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
train_loss = train(model, train_iterator, optimizer, criterion)
valid_loss = evaluate(model, valid_iterator, criterion)
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(model.state_dict(), 'rnn_sentiment_classifier.pth')
print(f'Epoch: {epoch+1:02} | Train Loss: {train_loss:.3f} | Val. Loss: {valid_loss:.3f}')
Finally, let’s test the model on the test set:
model.load_state_dict(torch.load('rnn_sentiment_classifier.pth'))
test_loss = evaluate(model, test_iterator, criterion)
print(f'Test Loss: {test_loss:.3f}')
And that’s it! We have successfully trained an RNN sentiment classifier using PyTorch. Feel free to experiment with different hyperparameters, architectures, and datasets to improve the model’s performance. Happy coding!
sm i believe it means small model
AttributeError: module 'torchtext' has no attribute 'legacy'
how to fix this??
—————————————————————————
AttributeError Traceback (most recent call last)
<ipython-input-1-b9957e880177> in <cell line: 4>()
2
3 import spacy
—-> 4 TEXT = torchtext.legacy.data.Field(tokenize='spacy',tokenizer_language='en_core_web_sm')
AttributeError: module 'torchtext' has no attribute 'legacy'
Thanks so much for this, I have been looking for examples of RNNs in pytorch, this is very clear. Has anybody figured out how to use the new torchtext API? They removed legacy and the provided migration guide is also broken, it's been a challenge to figure out how to get this to run with the current API.
Great presentation. Have spent a couple weeks now, every night, doing your videos and hands on notebooks! And I feel I made a lot more progress than with other, less coding-oriented classes.
Suggestion: define TEXT_COLUMN_NAME, LABEL_COLUMN_NAME as local variables, in all caps, and reference them as variable names everywhere.
This is really awesome stuff 🙂 Do you also have videos on transformer/BERT architecture? and the codes related to that?
Wonderful tutorial. Thanks.
If I am not wrong is this a single unit LSTM unit used in the model?
for text preprocessing you could have used a library like neattext.
Hello Prof. Raschka. What an amazing hands on tutorial on RNN!
I have seen one issue. At 37:26, "packed", the return value of "pack_padded_sequence", is not passed to the next layer "self.rnn".
But still this version is much better than the first one. As far as I've experimented, the reason is that when you enable sorting within batch, the sequence lengths in batches are very similar. This way RNN learns much better instead of learning dummy paddings.
Great video! Does the <unk> in the vocabulary indicate words that are not in our vocabulary? So in case our LSTM encounters an unknown word, it will be regarded as <unk> ?
thanks boss!
nn.lstm handles itself, previous output is input to next in the network?
Hi can I get the pdfs ?
thank you for uploading!
I once saw Deeplearning.ai's homework assignment put LSTM into the transformer's feedforward layer. I am not sure 1d CNN is fine or not.
Hello Sebastian. Love your books, just keep it up that way. As I said many times your book along with Aurelion Geron one are the best books on subject. I have read second and thrid edition and I always keep it in a desk, although I ve read it page to page… P. S. Convolution types pictures, same and valid when you explain them in a book, are replaced its unsignificant detail but cause it is repeated in second and third edition I thought just to let you know. Best regards