Solve Gymnasium FrozenLake-V1 with Deep Q-Learning (DQL/DQN) | Pytorch/Python Reinforcement Learning
Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with an environment. One popular reinforcement learning algorithm is Deep Q-Learning (DQL/DQN), which uses a deep neural network to approximate the Q-function in order to make decisions.
In this article, we will use PyTorch and Python to implement a Deep Q-Learning algorithm to solve the FrozenLake-V1 environment in the Gymnasium library. The FrozenLake-V1 environment is a grid-world game where the agent must navigate from a start state to a goal state while avoiding holes in the ice. The agent can move in 4 directions (up, down, left, right) and receives a reward of 1 upon reaching the goal state and a reward of 0 otherwise.
Setting up the environment
First, we will need to install the Gymnasium library and import the necessary modules for our implementation.
pip install gym
pip install torch
import gym
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
Defining the Deep Q-Network (DQN)
Next, we will define the deep neural network that will approximate the Q-function. This network will have an input layer corresponding to the state space of the environment, a hidden layer with ReLU activation, and an output layer corresponding to the action space of the environment.
class DQN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(DQN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x
Training the DQN
We will then define the training loop for our DQN. This loop will consist of interacting with the environment, updating the Q-function, and optimizing the neural network parameters using the Q-learning update rule.
# Initialize environment and DQN
env = gym.make('FrozenLake-v1')
input_size = env.observation_space.n
output_size = env.action_space.n
hidden_size = 128
dqn = DQN(input_size, hidden_size, output_size)
criterion = nn.MSELoss()
optimizer = optim.Adam(dqn.parameters())
# Training loop
num_episodes = 1000
gamma = 0.99
epsilon = 0.1
for episode in range(num_episodes):
state = env.reset()
done = False
while not done:
# Determine action using epsilon-greedy policy
if np.random.rand() < epsilon:
action = env.action_space.sample()
else:
q_values = dqn(torch.tensor(np.eye(input_size)[state], dtype=torch.float))
action = torch.argmax(q_values).item()
# Interact with environment
next_state, reward, done, _ = env.step(action)
# Calculate target Q-value
q_target = reward + gamma * torch.max(dqn(torch.tensor(np.eye(input_size)[next_state], dtype=torch.float)))
# Update Q-function
q_values = dqn(torch.tensor(np.eye(input_size)[state], dtype=torch.float))
q_values[action] = q_target
loss = criterion(dqn(torch.tensor(np.eye(input_size)[state], dtype=torch.float)), q_values)
# Optimize the model
optimizer.zero_grad()
loss.backward()
optimizer.step()
state = next_state
Evaluating the DQN
Finally, we can evaluate the performance of our trained DQN by testing it in the environment and observing its success rate.
# Testing the DQN
num_tests = 100
success_rate = 0
for _ in range(num_tests):
state = env.reset()
done = False
while not done:
q_values = dqn(torch.tensor(np.eye(input_size)[state], dtype=torch.float))
action = torch.argmax(q_values).item()
next_state, _, done, _ = env.step(action)
state = next_state
if next_state == 15:
success_rate += 1
print(f'Success rate: {success_rate/num_tests*100}%')
By following these steps, we have successfully implemented a Deep Q-Learning algorithm using PyTorch and Python to solve the FrozenLake-V1 environment in Gymnasium. This is just one example of how reinforcement learning can be applied to solve real-world problems, and there are many other environments and algorithms to explore in the field of reinforcement learning.
These videos on the new gymnasium version of gym are great. ❤ Could you do a video about thr bipedal walker environment?
Thank you for making this video. Your explanations were clear👍 , and I learned a lot. Also, I find your voice very pleasant to listen to.
I don't quite understand, because if you change the position of the puddles, the trained model will no longer be able to find the reward, right? What is the purpose of Qlearning then?
Thanks for the video! Can you make a example with a BattleShips game? im trying, but the action (ex. position 12) its the same that the new state (12)😢
sry, may i ask how can i find the max_step in the training of every epsiode? i do i know that max action is 200?
You are a great teacher, thank you so much and Merry Christmas 2023
Just curious how you learned all of this? Did you just read the documentation or watch other videos?
awesome video!
the goat