Using Deep Q-Learning in Pytorch/Python to Solve FrozenLake-V1 Gymnasium Environment

Posted by

Solve Gymnasium FrozenLake-V1 with Deep Q-Learning (DQL/DQN) | Pytorch/Python Reinforcement Learning

Solve Gymnasium FrozenLake-V1 with Deep Q-Learning (DQL/DQN) | Pytorch/Python Reinforcement Learning

Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with an environment. One popular reinforcement learning algorithm is Deep Q-Learning (DQL/DQN), which uses a deep neural network to approximate the Q-function in order to make decisions.

In this article, we will use PyTorch and Python to implement a Deep Q-Learning algorithm to solve the FrozenLake-V1 environment in the Gymnasium library. The FrozenLake-V1 environment is a grid-world game where the agent must navigate from a start state to a goal state while avoiding holes in the ice. The agent can move in 4 directions (up, down, left, right) and receives a reward of 1 upon reaching the goal state and a reward of 0 otherwise.

Setting up the environment

First, we will need to install the Gymnasium library and import the necessary modules for our implementation.

    
      
      pip install gym
      pip install torch
      import gym
      import numpy as np
      import torch
      import torch.nn as nn
      import torch.optim as optim
    
  

Defining the Deep Q-Network (DQN)

Next, we will define the deep neural network that will approximate the Q-function. This network will have an input layer corresponding to the state space of the environment, a hidden layer with ReLU activation, and an output layer corresponding to the action space of the environment.

    
      class DQN(nn.Module):
          def __init__(self, input_size, hidden_size, output_size):
              super(DQN, self).__init__()
              self.fc1 = nn.Linear(input_size, hidden_size)
              self.fc2 = nn.Linear(hidden_size, output_size)

          def forward(self, x):
              x = torch.relu(self.fc1(x))
              x = self.fc2(x)
              return x
    
  

Training the DQN

We will then define the training loop for our DQN. This loop will consist of interacting with the environment, updating the Q-function, and optimizing the neural network parameters using the Q-learning update rule.

    
      # Initialize environment and DQN
      env = gym.make('FrozenLake-v1')
      input_size = env.observation_space.n
      output_size = env.action_space.n
      hidden_size = 128
      dqn = DQN(input_size, hidden_size, output_size)
      criterion = nn.MSELoss()
      optimizer = optim.Adam(dqn.parameters())

      # Training loop
      num_episodes = 1000
      gamma = 0.99
      epsilon = 0.1
      for episode in range(num_episodes):
          state = env.reset()
          done = False
          while not done:
              # Determine action using epsilon-greedy policy
              if np.random.rand() < epsilon:
                  action = env.action_space.sample()
              else:
                  q_values = dqn(torch.tensor(np.eye(input_size)[state], dtype=torch.float))
                  action = torch.argmax(q_values).item()

              # Interact with environment
              next_state, reward, done, _ = env.step(action)

              # Calculate target Q-value
              q_target = reward + gamma * torch.max(dqn(torch.tensor(np.eye(input_size)[next_state], dtype=torch.float)))

              # Update Q-function
              q_values = dqn(torch.tensor(np.eye(input_size)[state], dtype=torch.float))
              q_values[action] = q_target
              loss = criterion(dqn(torch.tensor(np.eye(input_size)[state], dtype=torch.float)), q_values)

              # Optimize the model
              optimizer.zero_grad()
              loss.backward()
              optimizer.step()

              state = next_state
    
  

Evaluating the DQN

Finally, we can evaluate the performance of our trained DQN by testing it in the environment and observing its success rate.

    
      # Testing the DQN
      num_tests = 100
      success_rate = 0
      for _ in range(num_tests):
          state = env.reset()
          done = False
          while not done:
              q_values = dqn(torch.tensor(np.eye(input_size)[state], dtype=torch.float))
              action = torch.argmax(q_values).item()
              next_state, _, done, _ = env.step(action)
              state = next_state
          if next_state == 15:
              success_rate += 1
      print(f'Success rate: {success_rate/num_tests*100}%')
    
  

By following these steps, we have successfully implemented a Deep Q-Learning algorithm using PyTorch and Python to solve the FrozenLake-V1 environment in Gymnasium. This is just one example of how reinforcement learning can be applied to solve real-world problems, and there are many other environments and algorithms to explore in the field of reinforcement learning.

0 0 votes
Article Rating
9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@johngrigoriadis8373
6 months ago

These videos on the new gymnasium version of gym are great. ❤ Could you do a video about thr bipedal walker environment?

@thefall0190
6 months ago

Thank you for making this video. Your explanations were clear👍 , and I learned a lot. Also, I find your voice very pleasant to listen to.

@TutorialesHTML5
6 months ago

I don't quite understand, because if you change the position of the puddles, the trained model will no longer be able to find the reward, right? What is the purpose of Qlearning then?

@TutorialesHTML5
6 months ago

Thanks for the video! Can you make a example with a BattleShips game? im trying, but the action (ex. position 12) its the same that the new state (12)😢

@envelopepiano2453
6 months ago

sry, may i ask how can i find the max_step in the training of every epsiode? i do i know that max action is 200?

@drm8164
6 months ago

You are a great teacher, thank you so much and Merry Christmas 2023

@mikehawk69426
6 months ago

Just curious how you learned all of this? Did you just read the documentation or watch other videos?

@mikehawk69426
6 months ago

awesome video!

@dylan-652
6 months ago

the goat