Example of Bidirectional LSTM in PyTorch

Posted by


In this tutorial, we will be implementing a Bi-directional Long Short-Term Memory (LSTM) model using PyTorch. We will be using a dataset of text sentiment analysis to demonstrate how to build this model.

First, make sure you have PyTorch installed on your system. You can install PyTorch by following the instructions on the official PyTorch website (https://pytorch.org/).

Next, let’s import the necessary libraries:

import torch
import torch.nn as nn
import torch.optim as optim
import torchtext
from torchtext.legacy import data
from torchtext.legacy import datasets

Now, let’s define the field for our text data and include the sentiment analysis dataset:

TEXT = data.Field(tokenize = 'spacy', batch_first = True)
LABEL = data.LabelField(dtype = torch.float)

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

Now, let’s build the vocabulary and create the iterators for the training and testing data:

TEXT.build_vocab(train_data, max_size=25000, vectors="glove.6B.100d")
LABEL.build_vocab(train_data)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, test_data),
    batch_size = 64,
    sort_key = lambda x: len(x.text),
    sort_within_batch=True,
    device = device)

Next, we will define the Bi-directional LSTM model:

class BiLSTM(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout):
        super().__init__()

        self.embedding = nn.Embedding(input_dim, embedding_dim)

        self.lstm = nn.LSTM(embedding_dim, 
                            hidden_dim, 
                            num_layers=n_layers, 
                            bidirectional=bidirectional, 
                            dropout=dropout)

        self.fc = nn.Linear(hidden_dim * 2, output_dim)

        self.dropout = nn.Dropout(dropout)

    def forward(self, text):

        embedded = self.dropout(self.embedding(text))

        output, (hidden, cell) = self.lstm(embedded)

        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1))

        return self.fc(hidden)

Now, let’s instantiate the model, define the optimizer, and set the loss function:

INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5

model = BiLSTM(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM, N_LAYERS, BIDIRECTIONAL, DROPOUT)

optimizer = optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()

model = model.to(device)
criterion = criterion.to(device)

Next, let’s define the train and evaluate functions:

def train(model, iterator, optimizer, criterion):

    epoch_loss = 0
    epoch_acc = 0

    model.train()

    for batch in iterator:

        optimizer.zero_grad()

        predictions = model(batch.text).squeeze(1)

        loss = criterion(predictions, batch.label)

        rounded_preds = torch.round(torch.sigmoid(predictions))
        correct = (rounded_preds == batch.label).float() 
        acc = correct.sum() / len(correct)

        loss.backward()

        optimizer.step()

        epoch_loss += loss.item()
        epoch_acc += acc.item()

    return epoch_loss / len(iterator), epoch_acc / len(iterator)

def evaluate(model, iterator, criterion):

    epoch_loss = 0
    epoch_acc = 0

    model.eval()

    with torch.no_grad():

        for batch in iterator:

            predictions = model(batch.text).squeeze(1)

            loss = criterion(predictions, batch.label)

            rounded_preds = torch.round(torch.sigmoid(predictions))
            correct = (rounded_preds == batch.label).float() 
            acc = correct.sum() / len(correct)

            epoch_loss += loss.item()
            epoch_acc += acc.item()

    return epoch_loss / len(iterator), epoch_acc / len(iterator)

Now, let’s train the model:

N_EPOCHS = 5

for epoch in range(N_EPOCHS):

    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    test_loss, test_acc = evaluate(model, test_iterator, criterion)

    print(f'Epoch: {epoch+1:02}')
    print(f'tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f't Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

This completes the implementation of the Bi-directional LSTM model using PyTorch for text sentiment analysis. You can further experiment with different hyperparameters, network architectures, and datasets to improve the performance of the model.

0 0 votes
Article Rating
21 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@tusharkhustule3316
1 month ago

instantly subscribed!

@tiendadelchavo4132
1 month ago

Thank to point your github in the description!

@rohitborra2507
1 month ago

can u give the code link pleasee….?

@darkhanzholtayev3224
1 month ago

Thank you very much for such an informative video! I solved my issue

@Falconoo7383
1 month ago

Thank you for the video. Can you code CNN+LSTM?

@anlehoang6527
1 month ago

thank you a lot

@randomforrest9251
1 month ago

5:15 why do you only return the last hidden out? are the output of the forward-lstm for the last timestep and the output of the backward-lstm of the first timestep concatenated in that?

@shunchi691
1 month ago

Thanks for the videos! As for the bidirectional lstm, it usually takes the h_n (hidden_output) for classification (out, (h_n, c_n) = self.lstm(…)) because it contained the h_t at t=sequence_len. We can also use the output, but then out = torch.cat((out[:, -1, :self.hidden_size].squeeze(1), out[:, 0, self.hidden_size:].squeeze(1)), dim=1) to concatenate the last hidden outputs from both forward and backward directions.

@akother6521
1 month ago

Can someone please help me with an LSTM neural network model? I have to use the phm8 NASA dataset. I have preprocessed the data but I am not sure how to proceed. Please let me know!

@dirkneuhauser8213
1 month ago

Thanks a lot! 🙂 Great instruction videos. Do you also plan on making a GNN – tutorial?

@noreddine
1 month ago

Hello Aladdin, I have a question but first I want to Thank you very much for your tutorials they are so clear and straight to the point.
My question is, what if we have a variable number of steps in the data how is it possible to feed it to the lstm?

@shahihtv2582
1 month ago

I want to classify anomaly detection using RNN keras.tf but I have a problem where the accuracy value increases but the val_accuracy value does not change and just remains constant at 50%. this is my complete code available on google colab https://colab.research.google.com/drive/1saoNuCxj08JCxZ_7taIjhp8sJEIV-T5U?usp=sharing

@sayfriends87
1 month ago

Thanks for your video!
I have a little question.

Should we assign h0, c0 manually before using LSTM??
I saw other codes which did not have h0, c0 code.

Thanks in advance 🙂

@matthewchen7897
1 month ago

Excellent video, Please keep making more!!

@mystonefeel
1 month ago

self build LSTM in pytorch search [torch_rnn.py] and download
https://fatalfeel.blogspot.com/2013/12/reinforcement-learning-cnn.html

@feravladimirovna1044
1 month ago

what is the important of batch first and when we use it?

@feravladimirovna1044
1 month ago

thanks! for me always the hardest thing is to determine how to get out put from output or hidden state of lstm and how to know which slice to grab it is so confusing!

@DanielWeikert
1 month ago

Could you elaborate why the shapes of h0 must be like that? I don't get it. The GRU is defined by the nn Module but why is the shape of h0 like that. The tensor shaping part is the most difficult to grasp for me

@purplesta1n
1 month ago

Hello, i watched a few of your videos. I have a own timeseries dataset and want to use bidirectional LSTM to impute missing values. Can you give me any advice for other videos i can check to use my own dataset and which steps to proceed? I am completely new to these topics. Thank you very much.

@edmonda.9748
1 month ago

could you put the link to previous video somewhere? video where you explained those codes from before
thanks