A Step-by-Step Guide to Classifying Multivariate Time Series with LSTM using PyTorch, PyTorch Lightning, and Python

In this tutorial, we will walk you through how to perform multivariate time series classification using Long Short-Term Memory (LSTM) neural networks in PyTorch along with PyTorch Lightning framework in Python. We will cover the entire process from preprocessing the data, building the model, training, and evaluation.

Setting up the environment

First, make sure you have PyTorch and PyTorch Lightning installed in your Python environment. You can install the libraries using pip:

pip install torch torch-lightning

Next, import the necessary libraries:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd

Data Preparation

For this tutorial, we will use a sample multivariate time series dataset. You can use any dataset of your choice. Make sure that your dataset is in CSV format and contains multiple features along with the target variable.

Load the dataset into a Pandas DataFrame:

data = pd.read_csv('your_dataset.csv')

Next, convert the DataFrame into a numpy array:

data = data.to_numpy()

Now, split the data into input features and target variable:

X = data[:, :-1] # input features
y = data[:, -1] # target variable

Data Preprocessing

Normalize the input features using StandardScaler:

scaler = StandardScaler()
X = scaler.fit_transform(X)

Create DataLoader

Create a PyTorch DataLoader to feed the data into the model:

X = torch.tensor(X, dtype=torch.float)
y = torch.tensor(y, dtype=torch.long)

dataset = TensorDataset(X, y)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

Building the LSTM model

Now, let’s build the LSTM model using PyTorch. Here is a simple implementation:

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(LSTM, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)

        out, _ = self.lstm(x, (h0, c0))

        out = self.fc(out[:, -1, :])

        return out

Training the model with PyTorch Lightning

We will train the model using PyTorch Lightning which simplifies the training loop. Here’s a sample code to train the model:

from pytorch_lightning import LightningModule, Trainer

class LSTMClassifier(LightningModule):
    def __init__(self, input_size, hidden_size, num_layers, output_size, lr=0.001):
        super(LSTMClassifier, self).__init__()

        self.model = LSTM(input_size, hidden_size, num_layers, output_size)
        self.criterion = nn.CrossEntropyLoss()
        self.lr = lr

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self.model(x)
        loss = self.criterion(y_hat, y)
        return loss

    def configure_optimizers(self):
        return optim.Adam(self.parameters(), lr=self.lr)

model = LSTMClassifier(input_size=X.shape[1], hidden_size=64, num_layers=1, output_size=2)
trainer = Trainer(max_epochs=10, gpus=1) # you can adjust the number of epochs and other parameters
trainer.fit(model, dataloader)


After training, you can evaluate the model on a separate test dataset. You can also use metrics such as accuracy, precision, recall, etc., to evaluate the model’s performance.

That’s it! You have successfully built a multivariate time series classification model using LSTM in PyTorch and PyTorch Lightning. Feel free to experiment with different architectures, hyperparameters, and datasets to improve the model’s performance.

