In Lecture 2 of Dive Into Deep Learning, we will be focusing on PyTorch’s automatic differentiation capabilities using the torch.autograd
module. PyTorch provides automatic differentiation for all operations on Tensors, which allows us to easily compute gradients of our loss function with respect to our model parameters. This is essential for optimizing our models using techniques like stochastic gradient descent.
Let’s start by importing the necessary libraries:
import torch
Next, let’s create a simple example to demonstrate automatic differentiation in PyTorch. We will define a simple linear regression model and compute the gradients of the loss function with respect to the model parameters.
# Define the input data and the true weights
x = torch.tensor([1.0, 2.0, 3.0, 4.0])
y = torch.tensor([2.0, 4.0, 6.0, 8.0])
w = torch.tensor([0.0], requires_grad=True)
# Define the linear regression model
def linear_model(x, w):
return x * w
# Define the loss function (mean squared error)
def loss_function(y_pred, y):
return torch.mean((y_pred - y)**2)
# Run the model
y_pred = linear_model(x, w)
loss = loss_function(y_pred, y)
print(f'Initial loss: {loss.item()}')
Next, we will compute the gradients of the loss function with respect to the model parameters using PyTorch’s automatic differentiation capabilities. We can do this by calling the backward()
method on the loss tensor.
loss.backward()
After calling backward()
, the gradients of the loss function with respect to the model parameters are stored in the grad
attribute of each parameter tensor. We can access these gradients using the grad
attribute.
print(f'Gradient of the loss with respect to w: {w.grad.item()}')
Finally, we can update the model parameters using gradient descent. We can do this by manually updating the values of the model parameters based on the gradients and the learning rate.
learning_rate = 0.01
with torch.no_grad():
w -= learning_rate * w.grad
# Reset the gradients
w.grad.zero_()
# Compute the new loss
y_pred = linear_model(x, w)
loss = loss_function(y_pred, y)
print(f'Updated loss: {loss.item()}')
This is a simple example, but it demonstrates the power of PyTorch’s automatic differentiation capabilities. With automatic differentiation, we can easily compute gradients of our loss function with respect to our model parameters, allowing us to optimize our models using techniques like stochastic gradient descent.
In summary, PyTorch’s torch.autograd
module provides automatic differentiation capabilities that allow us to easily compute gradients of our loss function with respect to our model parameters. By utilizing automatic differentiation in PyTorch, we can efficiently optimize our models using techniques like stochastic gradient descent. Try experimenting with different models and loss functions to further explore PyTorch’s automatic differentiation capabilities.
thanks man i was finding it extremely difficult to understand the maths behind backward and detach (although i have doen it in my high school) because no one was explaining them in this depth………love you 😍😍
thnks myfriend – in last section – whats is the z.sum ? what is the the (SUM) function for ? why yyou put the sum ?
way better than my prof…
Thank you
Thank you sooo much! This finally clicked for me.
This is what a proper tutorial should be. Thanks a lot. Subscribed
This video deserves more view.
Excellent videos and the textbook, deeply admire your contributions
29:49 – As far as I understand in 'a.grad' should turn out [12., 18.]
14:50 x.grad contains the valuesd of partial{y} / partial{x}
17:50 x.grad.zero_()
25:00 gradient for multiple inputs -> multiple outputs. Since the Jacobian is a matrix, we need to input a 1-d tensor to get a valid vector-output. => But our loss function has been a scalar, so this is why I am not accustomed to this form.
34:10 explaining .detach(). => treat those as constants, not a variable that we differentiate w.r.t.
Please Do cover a playlist on Graph Neural Networks (at least discuss all the basics and methods of GNNs). The internet world lacks quality contents on this topic
Truly thankful to you. To the point without confusion. Thank you once again
Fixed your broken web page that forces us to bend to your flaws.
No!
Thanks a lot.
Thanks for the video, it really cleared up Pytorch autograd, now I will be making notes on this gold nugget
Awesome video! Thank you!
The way you define the y is different from what i am thinking.
In first example, I thought we should define y as x**2 or in second one y as x. But if i define the y like this, I will get an error which say that it needs just one number not sequence of numbers.
First of all, I want to express my gratitude to you for the work you have done. There is one thing i want you to ask: why do we write the partial derivatives of the scalar function in the form of column? Whereas, following the logic of Jacobian matrix, it should be a row. Thanks in advance!
thank you very much for uploading this video! very helpful!