Building an RNN Sentiment Classifier using PyTorch

Posted by


In this tutorial, we will walk through the process of creating a sentiment classifier using a Recurrent Neural Network (RNN) in PyTorch. We will be using the IMDB movie reviews dataset for this task, where our model will learn to predict the sentiment of movie reviews as positive or negative.

The RNN architecture is well-suited for sequential data like text because it can capture dependencies between words in a sentence. We will use a simple RNN architecture with an embedding layer, an RNN layer, and a fully connected layer for classification.

Let’s get started by importing the necessary libraries:

import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.legacy import data
from torchtext.legacy import datasets

Next, we will define the batch size and seed for reproducibility:

BATCH_SIZE = 64
SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

Now, let’s load the IMDB dataset and define the fields for our text and label:

TEXT = data.Field(tokenize='spacy', lower=True)
LABEL = data.LabelField(dtype=torch.float)
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

We will split the training data into training and validation sets and build the vocabulary:

train_data, valid_data = train_data.split(random_state=random.seed(SEED))
TEXT.build_vocab(train_data, max_size=25000, vectors="glove.6B.100d", unk_init=torch.Tensor.normal_)
LABEL.build_vocab(train_data)

Next, we will create the iterators for the training, validation, and test sets:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data),
    batch_size=BATCH_SIZE,
    device=device)

Now, let’s define our RNN model:

class RNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super().__init__()
        self.embedding = nn.Embedding(input_dim, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, text):
        embedded = self.embedding(text)
        output, hidden = self.rnn(embedded)
        assert torch.equal(output[-1,:,:], hidden.squeeze(0))
        return self.fc(hidden.squeeze(0))

Now, let’s instantiate the model and define the hyperparameters:

INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1

model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM).to(device)
optimizer = optim.SGD(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()

Next, we will define functions for training and evaluation:

def train(model, iterator, optimizer, criterion):
    model.train()
    epoch_loss = 0
    for batch in iterator:
        optimizer.zero_grad()
        predictions = model(batch.text).squeeze(1)
        loss = criterion(predictions, batch.label)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    return epoch_loss / len(iterator)

def evaluate(model, iterator, criterion):
    model.eval()
    epoch_loss = 0
    with torch.no_grad():
        for batch in iterator:
            predictions = model(batch.text).squeeze(1)
            loss = criterion(predictions, batch.label)
            epoch_loss += loss.item()
    return epoch_loss / len(iterator)

Now, let’s train the model and evaluate it on the validation set:

N_EPOCHS = 5

best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
    train_loss = train(model, train_iterator, optimizer, criterion)
    valid_loss = evaluate(model, valid_iterator, criterion)

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'rnn_sentiment_classifier.pth')

    print(f'Epoch: {epoch+1:02} | Train Loss: {train_loss:.3f} | Val. Loss: {valid_loss:.3f}')

Finally, let’s test the model on the test set:

model.load_state_dict(torch.load('rnn_sentiment_classifier.pth'))
test_loss = evaluate(model, test_iterator, criterion)
print(f'Test Loss: {test_loss:.3f}')

And that’s it! We have successfully trained an RNN sentiment classifier using PyTorch. Feel free to experiment with different hyperparameters, architectures, and datasets to improve the model’s performance. Happy coding!

0 0 votes
Article Rating
17 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@abderahimmazouz2088
1 month ago

sm i believe it means small model

@debabratasikder9448
1 month ago

AttributeError: module 'torchtext' has no attribute 'legacy'

@sadikaljarif9635
1 month ago

how to fix this??

@sadikaljarif9635
1 month ago

—————————————————————————

AttributeError Traceback (most recent call last)

<ipython-input-1-b9957e880177> in <cell line: 4>()

2

3 import spacy

—-> 4 TEXT = torchtext.legacy.data.Field(tokenize='spacy',tokenizer_language='en_core_web_sm')

AttributeError: module 'torchtext' has no attribute 'legacy'

@donatocapitella
1 month ago

Thanks so much for this, I have been looking for examples of RNNs in pytorch, this is very clear. Has anybody figured out how to use the new torchtext API? They removed legacy and the provided migration guide is also broken, it's been a challenge to figure out how to get this to run with the current API.

@bitdribble
1 month ago

Great presentation. Have spent a couple weeks now, every night, doing your videos and hands on notebooks! And I feel I made a lot more progress than with other, less coding-oriented classes.

Suggestion: define TEXT_COLUMN_NAME, LABEL_COLUMN_NAME as local variables, in all caps, and reference them as variable names everywhere.

@madhu1987ful
1 month ago

This is really awesome stuff 🙂 Do you also have videos on transformer/BERT architecture? and the codes related to that?

@vikramsandu6054
1 month ago

Wonderful tutorial. Thanks.

@akashghosh4766
1 month ago

If I am not wrong is this a single unit LSTM unit used in the model?

@Rahulsircar94
1 month ago

for text preprocessing you could have used a library like neattext.

@kafaayari
1 month ago

Hello Prof. Raschka. What an amazing hands on tutorial on RNN!
I have seen one issue. At 37:26, "packed", the return value of "pack_padded_sequence", is not passed to the next layer "self.rnn".
But still this version is much better than the first one. As far as I've experimented, the reason is that when you enable sorting within batch, the sequence lengths in batches are very similar. This way RNN learns much better instead of learning dummy paddings.

@randb9378
1 month ago

Great video! Does the <unk> in the vocabulary indicate words that are not in our vocabulary? So in case our LSTM encounters an unknown word, it will be regarded as <unk> ?

@saadouch
1 month ago

thanks boss!

@abubakarali6399
1 month ago

nn.lstm handles itself, previous output is input to next in the network?

@DataTheory92
1 month ago

Hi can I get the pdfs ?

@jonathansum9084
1 month ago

thank you for uploading!
I once saw Deeplearning.ai's homework assignment put LSTM into the transformer's feedforward layer. I am not sure 1d CNN is fine or not.

@milanradovanovic3693
1 month ago

Hello Sebastian. Love your books, just keep it up that way. As I said many times your book along with Aurelion Geron one are the best books on subject. I have read second and thrid edition and I always keep it in a desk, although I ve read it page to page… P. S. Convolution types pictures, same and valid when you explain them in a book, are replaced its unsignificant detail but cause it is repeated in second and third edition I thought just to let you know. Best regards