Exploring Deep Learning with PyTorch: Lecture 2 on Automatic Differentiation using torch.autograd and backward

Posted by


In Lecture 2 of Dive Into Deep Learning, we will be focusing on PyTorch’s automatic differentiation capabilities using the torch.autograd module. PyTorch provides automatic differentiation for all operations on Tensors, which allows us to easily compute gradients of our loss function with respect to our model parameters. This is essential for optimizing our models using techniques like stochastic gradient descent.

Let’s start by importing the necessary libraries:

import torch

Next, let’s create a simple example to demonstrate automatic differentiation in PyTorch. We will define a simple linear regression model and compute the gradients of the loss function with respect to the model parameters.

# Define the input data and the true weights
x = torch.tensor([1.0, 2.0, 3.0, 4.0])
y = torch.tensor([2.0, 4.0, 6.0, 8.0])
w = torch.tensor([0.0], requires_grad=True)

# Define the linear regression model
def linear_model(x, w):
    return x * w

# Define the loss function (mean squared error)
def loss_function(y_pred, y):
    return torch.mean((y_pred - y)**2)

# Run the model
y_pred = linear_model(x, w)
loss = loss_function(y_pred, y)

print(f'Initial loss: {loss.item()}')

Next, we will compute the gradients of the loss function with respect to the model parameters using PyTorch’s automatic differentiation capabilities. We can do this by calling the backward() method on the loss tensor.

loss.backward()

After calling backward(), the gradients of the loss function with respect to the model parameters are stored in the grad attribute of each parameter tensor. We can access these gradients using the grad attribute.

print(f'Gradient of the loss with respect to w: {w.grad.item()}')

Finally, we can update the model parameters using gradient descent. We can do this by manually updating the values of the model parameters based on the gradients and the learning rate.

learning_rate = 0.01
with torch.no_grad():
    w -= learning_rate * w.grad

# Reset the gradients
w.grad.zero_()

# Compute the new loss
y_pred = linear_model(x, w)
loss = loss_function(y_pred, y)

print(f'Updated loss: {loss.item()}')

This is a simple example, but it demonstrates the power of PyTorch’s automatic differentiation capabilities. With automatic differentiation, we can easily compute gradients of our loss function with respect to our model parameters, allowing us to optimize our models using techniques like stochastic gradient descent.

In summary, PyTorch’s torch.autograd module provides automatic differentiation capabilities that allow us to easily compute gradients of our loss function with respect to our model parameters. By utilizing automatic differentiation in PyTorch, we can efficiently optimize our models using techniques like stochastic gradient descent. Try experimenting with different models and loss functions to further explore PyTorch’s automatic differentiation capabilities.

0 0 votes
Article Rating
26 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@nitinsrivastav3541
2 months ago

thanks man i was finding it extremely difficult to understand the maths behind backward and detach (although i have doen it in my high school) because no one was explaining them in this depth………love you 😍😍

@ramincybran
2 months ago

thnks myfriend – in last section – whats is the z.sum ? what is the the (SUM) function for ? why yyou put the sum ?

@user-wz3np2hf3g
2 months ago

way better than my prof…

@azharhussian4326
2 months ago

Thank you

@asfandiyar5829
2 months ago

Thank you sooo much! This finally clicked for me.

@sktdebnath
2 months ago

This is what a proper tutorial should be. Thanks a lot. Subscribed

@sadpotato5111
2 months ago

This video deserves more view.

@atanudasgupta
2 months ago

Excellent videos and the textbook, deeply admire your contributions

@greender644
2 months ago

29:49 – As far as I understand in 'a.grad' should turn out [12., 18.]

@user-vm9hl3gl5h
2 months ago

14:50 x.grad contains the valuesd of partial{y} / partial{x}
17:50 x.grad.zero_()
25:00 gradient for multiple inputs -> multiple outputs. Since the Jacobian is a matrix, we need to input a 1-d tensor to get a valid vector-output. => But our loss function has been a scalar, so this is why I am not accustomed to this form.
34:10 explaining .detach(). => treat those as constants, not a variable that we differentiate w.r.t.

@callpie2398
2 months ago

Please Do cover a playlist on Graph Neural Networks (at least discuss all the basics and methods of GNNs). The internet world lacks quality contents on this topic

@reddysekhar3765
2 months ago

Truly thankful to you. To the point without confusion. Thank you once again

@vtrandal
2 months ago

Fixed your broken web page that forces us to bend to your flaws.

@vtrandal
2 months ago

No!

@md.zahidulislam3548
2 months ago

Thanks a lot.

@azzyfreeman
2 months ago

Thanks for the video, it really cleared up Pytorch autograd, now I will be making notes on this gold nugget

@AJ-et3vf
2 months ago

Awesome video! Thank you!

@mahdiamrollahi8456
2 months ago

The way you define the y is different from what i am thinking.
In first example, I thought we should define y as x**2 or in second one y as x. But if i define the y like this, I will get an error which say that it needs just one number not sequence of numbers.

@Gibson-xn8xk
2 months ago

First of all, I want to express my gratitude to you for the work you have done. There is one thing i want you to ask: why do we write the partial derivatives of the scalar function in the form of column? Whereas, following the logic of Jacobian matrix, it should be a row. Thanks in advance!

@rohithpeesapati8840
2 months ago

thank you very much for uploading this video! very helpful!