Tuning Hyperparameters in PyTorch: A Guide to Coding

Posted by


Hyperparameter tuning is a crucial step in the process of optimizing your neural network model. PyTorch provides a powerful framework for building and training deep learning models, and includes tools to help you tune hyperparameters efficiently.

In this tutorial, we will walk through the process of hyperparameter tuning in PyTorch, focusing on how to optimize the learning rate for a simple neural network. We will use the famous MNIST dataset for this demonstration.

Step 1: Setting up your environment

Before we begin, make sure you have PyTorch installed on your system. You can install PyTorch using pip:

pip install torch torchvision

Next, import the necessary libraries:

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms

Step 2: Loading the dataset

Load the MNIST dataset using torchvision. We will normalize the data and create data loaders for training and validation sets:

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)

test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)

Step 3: Define the neural network

Next, define a simple neural network architecture. For this tutorial, we will use a fully connected network with one hidden layer:

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 784)
        x = torch.sigmoid(self.fc1(x))
        x = self.fc2(x)
        return x

Step 4: Define the training loop

Now, define a training function that takes in a learning rate as an argument. Inside this function, instantiate the model, criterion, optimizer, and train the model:

def train_model(lr):
    model = SimpleNN()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=lr)

    for epoch in range(5):
        model.train()
        for i, (inputs, targets) in enumerate(train_loader):
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()

        print(f'Epoch {epoch+1}, Loss: {loss.item()}')

    return model

Step 5: Hyperparameter tuning

To tune the learning rate, define a list of learning rates to try:

learning_rates = [0.001, 0.01, 0.1, 1]

Loop through the list of learning rates, train the model, and evaluate the model on the validation set:

for lr in learning_rates:
    model = train_model(lr)

    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += targets.size(0)
            correct += (predicted == targets).sum().item()

    print(f'Learning Rate: {lr}, Accuracy: {100 * correct / total}%')

By examining the accuracy results for each learning rate, you can identify the optimal learning rate for your model. You can also explore other hyperparameters like batch size, number of hidden units, and number of layers using a similar approach.

In this tutorial, we have demonstrated how to perform hyperparameter tuning in PyTorch using the learning rate as an example. Experiment with different hyperparameters and neural network architectures to optimize your models for better performance. Happy coding!

0 0 votes
Article Rating
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@yaswanthsyamala4794
1 month ago

Can you please make a video on how we can do hyper parameter tuning for regression model for best activation functions , optimizers and no of hidden layers?