Coding Llama 3 from scratch in PyTorch – Part 2
Welcome to Part 2 of our Coding Llama 3 from scratch in PyTorch series! In this tutorial, we will continue building our neural network model in PyTorch, focusing on defining the loss function and training loop.
Defining the Loss Function
Now that we have our model architecture defined in Part 1, the next step is to define the loss function for our neural network. The loss function is a measure of how well our model is performing, and we use it to update the model parameters during training.
In PyTorch, we can easily define the loss function using built-in modules. One common loss function for classification tasks is the CrossEntropyLoss, which is suitable for multi-class classification problems like Coding Llama 3. We can define the loss function as follows:
import torch.nn as nn
loss_function = nn.CrossEntropyLoss()
Training the Model
Now that we have our model architecture and loss function defined, we can move on to training our neural network. Training a neural network involves feeding the training data through the model, calculating the loss, and updating the model parameters using an optimization algorithm like Stochastic Gradient Descent (SGD).
To train our model in PyTorch, we will define a training loop that iterates over the training data for a certain number of epochs. During each iteration, we will calculate the loss and update the model parameters using the optimizer. Here is an example training loop:
import torch.optim as optim
optimizer = optim.SGD(model.parameters(), lr=0.001)
for epoch in range(num_epochs):
for inputs, labels in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = loss_function(outputs, labels)
loss.backward()
optimizer.step()
In this training loop, we first initialize the optimizer with the model parameters and a learning rate. We then iterate over the training data for a specified number of epochs, calculating the loss and updating the model parameters using the optimizer.
Conclusion
In this tutorial, we continued building our Coding Llama 3 neural network model in PyTorch, focusing on defining the loss function and training loop. By defining the loss function and training the model, we are one step closer to training a fully functional neural network for our classification task.
Stay tuned for Part 3, where we will evaluate the performance of our model on a validation set and make predictions on unseen data!
😊
😊🎉
Why are there only 3 likes, I put 4 on HF.)
Why do you use 32 bit paged optimzier when the model is being fine-tuned with QLoRA? Surely QLoRA stores the weights in 8bit double quantized form, so using a 32 bit optimizer makes no difference, and the weight updates need to be converted back to 8 bit anyway? Please help me understand this
Your English is nice
cool
Thanks for committing to the open source and educating people on cutting edge knowledge.
Very good, can’t wait to see updates to it.