In PyTorch, saving and loading models is an essential part of the machine learning workflow, as it allows you to reuse trained models for inference, evaluation, or further training without having to retrain them from scratch. In this tutorial, we will explore how to save and load models in PyTorch using the torch.save() and torch.load() functions.
- Saving a model in PyTorch:
To save a trained PyTorch model, you can use the torch.save() function. The torch.save() function takes two arguments: the model you want to save and the file path where you want to save the model. Here’s an example of how to save a model:
# Save the model
torch.save(model.state_dict(), 'model.pth')
In the example above, we are saving the state dictionary of the model to a file called ‘model.pth’. The state dictionary contains all the learnable parameters of the model, which are necessary to reconstruct the model’s architecture and weights during the loading process.
- Loading a model in PyTorch:
To load a saved PyTorch model, you can use the torch.load() function. The torch.load() function takes a file path as an argument and returns the state dictionary of the saved model. Here’s an example of how to load a model:
# Load the model
state_dict = torch.load('model.pth')
model.load_state_dict(state_dict)
In the example above, we first load the state dictionary of the saved model from the file ‘model.pth’ using the torch.load() function. Then, we load the state dictionary into the model using the load_state_dict() method, which initializes the model’s parameters with the saved weights.
- Saving and loading the entire model:
In some cases, you may want to save and load the entire model, including its architecture and parameters. To do this, you can save and load the entire model object instead of just the state dictionary. Here’s an example of how to save and load the entire model:
# Save the entire model
torch.save(model, 'model.pth')
# Load the entire model
model = torch.load('model.pth')
In the example above, we save the entire model object to a file called ‘model.pth’ using the torch.save() function. Then, we load the entire model object back from the saved file using the torch.load() function.
- Saving and loading models across devices:
When saving and loading models in PyTorch, it’s important to consider the device on which the model was trained and the device on which you want to load the model. By default, PyTorch saves models on CPU, so if you want to load a model on a GPU, you need to move the model to the desired device after loading the model. Here’s an example of how to load a model on a GPU:
# Load the model on GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = torch.load('model.pth', map_location=device)
model.to(device)
In the example above, we first check if a GPU is available, and then load the model to the GPU using the map_location argument in the torch.load() function. Finally, we move the model to the desired device using the to() method.
By following the steps outlined in this tutorial, you can easily save and load trained models in PyTorch for various machine learning tasks. Saving and loading models allows you to reuse your trained models, share them with others, or deploy them in production environments.
If I save my model and optimizer state (whose learning rate is zero now after cosine annealing for some epochs) and want to continue training the same model later but with a fresh cosine annealing learning rate scheduler, which of the following two code will work?
Code 1:
learning_rate = 3e-4
checkpoint_last = torch.load("./enet1_checklast.pth.tar")
model.load_state_dict(checkpoint_last["state_dict"])
model.to(device)
optimizer.load_state_dict(checkpoint_last["optimizer"])
optimizer.param_groups[0]["lr"] = learning_rate // Does this work?
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 29)
Code 2:
#Here I am not loading the optimizer state because current learning rate of the optimizer is zero. I am not sure what will be the optimizer's state after running this code
learning_rate = 3e-4
checkpoint_last = torch.load("./enet1_checklast.pth.tar")
model.load_state_dict(checkpoint_last["state_dict"])
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 29)
Than you in advance.
Hi, thanks for the video. I have a quick question: if I want to save the model after training, to be used only for inference in the future and not further training, do I do it the same way as shown here?
May I know if I can do something like I train my model for 3/10 epochs, and then I interrupt the training process and save the model. Then I keep training my model next day starting from epoch 4?