In this tutorial, we will be implementing a Bi-directional Long Short-Term Memory (LSTM) model using PyTorch. We will be using a dataset of text sentiment analysis to demonstrate how to build this model.
First, make sure you have PyTorch installed on your system. You can install PyTorch by following the instructions on the official PyTorch website (https://pytorch.org/).
Next, let’s import the necessary libraries:
import torch
import torch.nn as nn
import torch.optim as optim
import torchtext
from torchtext.legacy import data
from torchtext.legacy import datasets
Now, let’s define the field for our text data and include the sentiment analysis dataset:
TEXT = data.Field(tokenize = 'spacy', batch_first = True)
LABEL = data.LabelField(dtype = torch.float)
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)
Now, let’s build the vocabulary and create the iterators for the training and testing data:
TEXT.build_vocab(train_data, max_size=25000, vectors="glove.6B.100d")
LABEL.build_vocab(train_data)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_iterator, test_iterator = data.BucketIterator.splits(
(train_data, test_data),
batch_size = 64,
sort_key = lambda x: len(x.text),
sort_within_batch=True,
device = device)
Next, we will define the Bi-directional LSTM model:
class BiLSTM(nn.Module):
def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout):
super().__init__()
self.embedding = nn.Embedding(input_dim, embedding_dim)
self.lstm = nn.LSTM(embedding_dim,
hidden_dim,
num_layers=n_layers,
bidirectional=bidirectional,
dropout=dropout)
self.fc = nn.Linear(hidden_dim * 2, output_dim)
self.dropout = nn.Dropout(dropout)
def forward(self, text):
embedded = self.dropout(self.embedding(text))
output, (hidden, cell) = self.lstm(embedded)
hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1))
return self.fc(hidden)
Now, let’s instantiate the model, define the optimizer, and set the loss function:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5
model = BiLSTM(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM, N_LAYERS, BIDIRECTIONAL, DROPOUT)
optimizer = optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()
model = model.to(device)
criterion = criterion.to(device)
Next, let’s define the train and evaluate functions:
def train(model, iterator, optimizer, criterion):
epoch_loss = 0
epoch_acc = 0
model.train()
for batch in iterator:
optimizer.zero_grad()
predictions = model(batch.text).squeeze(1)
loss = criterion(predictions, batch.label)
rounded_preds = torch.round(torch.sigmoid(predictions))
correct = (rounded_preds == batch.label).float()
acc = correct.sum() / len(correct)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
def evaluate(model, iterator, criterion):
epoch_loss = 0
epoch_acc = 0
model.eval()
with torch.no_grad():
for batch in iterator:
predictions = model(batch.text).squeeze(1)
loss = criterion(predictions, batch.label)
rounded_preds = torch.round(torch.sigmoid(predictions))
correct = (rounded_preds == batch.label).float()
acc = correct.sum() / len(correct)
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
Now, let’s train the model:
N_EPOCHS = 5
for epoch in range(N_EPOCHS):
train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
test_loss, test_acc = evaluate(model, test_iterator, criterion)
print(f'Epoch: {epoch+1:02}')
print(f'tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
print(f't Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')
This completes the implementation of the Bi-directional LSTM model using PyTorch for text sentiment analysis. You can further experiment with different hyperparameters, network architectures, and datasets to improve the performance of the model.
instantly subscribed!
Thank to point your github in the description!
can u give the code link pleasee….?
Thank you very much for such an informative video! I solved my issue
Thank you for the video. Can you code CNN+LSTM?
thank you a lot
5:15 why do you only return the last hidden out? are the output of the forward-lstm for the last timestep and the output of the backward-lstm of the first timestep concatenated in that?
Thanks for the videos! As for the bidirectional lstm, it usually takes the h_n (hidden_output) for classification (out, (h_n, c_n) = self.lstm(…)) because it contained the h_t at t=sequence_len. We can also use the output, but then out = torch.cat((out[:, -1, :self.hidden_size].squeeze(1), out[:, 0, self.hidden_size:].squeeze(1)), dim=1) to concatenate the last hidden outputs from both forward and backward directions.
Can someone please help me with an LSTM neural network model? I have to use the phm8 NASA dataset. I have preprocessed the data but I am not sure how to proceed. Please let me know!
Thanks a lot! 🙂 Great instruction videos. Do you also plan on making a GNN – tutorial?
Hello Aladdin, I have a question but first I want to Thank you very much for your tutorials they are so clear and straight to the point.
My question is, what if we have a variable number of steps in the data how is it possible to feed it to the lstm?
I want to classify anomaly detection using RNN keras.tf but I have a problem where the accuracy value increases but the val_accuracy value does not change and just remains constant at 50%. this is my complete code available on google colab https://colab.research.google.com/drive/1saoNuCxj08JCxZ_7taIjhp8sJEIV-T5U?usp=sharing
Thanks for your video!
I have a little question.
Should we assign h0, c0 manually before using LSTM??
I saw other codes which did not have h0, c0 code.
Thanks in advance 🙂
Excellent video, Please keep making more!!
self build LSTM in pytorch search [torch_rnn.py] and download
https://fatalfeel.blogspot.com/2013/12/reinforcement-learning-cnn.html
what is the important of batch first and when we use it?
thanks! for me always the hardest thing is to determine how to get out put from output or hidden state of lstm and how to know which slice to grab it is so confusing!
Could you elaborate why the shapes of h0 must be like that? I don't get it. The GRU is defined by the nn Module but why is the shape of h0 like that. The tensor shaping part is the most difficult to grasp for me
Hello, i watched a few of your videos. I have a own timeseries dataset and want to use bidirectional LSTM to impute missing values. Can you give me any advice for other videos i can check to use my own dataset and which steps to proceed? I am completely new to these topics. Thank you very much.
could you put the link to previous video somewhere? video where you explained those codes from before
thanks