Recognizing Handwriting Words Using PyTorch: A Step-By-Step Guide

Posted by


PyTorch is a popular open-source machine learning library developed by Facebook’s AI Research lab. It provides various tools and modules to build and train deep learning models. In this tutorial, we will learn how to perform handwriting words recognition using PyTorch. Specifically, we will build a model that can recognize handwritten words from images.

Step 1: Install PyTorch
Before we can start building our model, we need to install PyTorch. PyTorch can be easily installed using pip. You can install PyTorch by running the following command:

pip install torch torchvision

Step 2: Prepare the Dataset
For this tutorial, we will use the IAM Handwriting database, which contains handwritten words by different writers. You can download the dataset from the official website of the IAM Handwriting Database. Once you have downloaded the dataset, you can extract it to a directory of your choice.

Step 3: Preprocess the Dataset
Next, we need to preprocess the dataset. We will use the torchvision.transforms module to apply transformations to the images. In this case, we will resize the images and convert them to grayscale. Here is the code to preprocess the dataset:

import torchvision.transforms as transforms
from torchvision import datasets

transform = transforms.Compose([
    transforms.Resize((32, 128)),
    transforms.Grayscale(),
    transforms.ToTensor()
])

train_dataset = datasets.ImageFolder(root='path/to/iam_database', transform=transform)

Step 4: Build the Model
Now, we can build our handwriting words recognition model. We will create a simple convolutional neural network (CNN) for this task. Here is the code to build the model:

import torch
import torch.nn as nn
import torch.nn.functional as F

class HandwritingRecognitionModel(nn.Module):
    def __init__(self):
        super(HandwritingRecognitionModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
        self.fc1 = nn.Linear(64 * 6 * 28, 128)
        self.fc2 = nn.Linear(128, 26)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = x.view(-1, 64 * 6 * 28)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = HandwritingRecognitionModel()

Step 5: Train the Model
Next, we will train the model on the dataset. We will use the torch.optim module to define an optimizer and the torch.utils.data module to create data loaders. Here is the code to train the model:

import torch.optim as optim
from torch.utils.data import DataLoader

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

for epoch in range(10):
    for i, (inputs, targets) in enumerate(train_loader):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        if i % 100 == 0:
            print(f'Epoch {epoch}, Iteration {i}, Loss: {loss.item()}')

Step 6: Test the Model
Finally, we can test the model on some test images to see how well it performs at recognizing handwritten words. Here is the code to test the model:

test_dataset = datasets.ImageFolder(root='path/to/test_dataset', transform=transform)
test_loader = DataLoader(test_dataset, batch_size=32)

correct = 0
total = 0

model.eval()
with torch.no_grad():
    for inputs, targets in test_loader:
        outputs = model(inputs)
        _, predicted = torch.max(outputs, 1)
        total += targets.size(0)
        correct += (predicted == targets).sum().item()

print(f'Accuracy: {(correct/total)*100}%')

And that’s it! You have successfully built a handwriting words recognition model using PyTorch. You can further improve the model’s performance by experimenting with different architectures, hyperparameters, and training strategies. Happy coding!

0 0 votes
Article Rating

Leave a Reply

28 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@m_e771
2 hours ago

Hello, my dear friend! Honestly, words can't express how impressed I am by the amazing content you create. Your way of explaining AI model development is truly inspiring and reflects your extensive expertise😍😍. I have a dream of designing a model that can extract handwritten Arabic text, but I feel a bit lost on where to start. Could you kindly guide me with your great advice or provide some initial steps to get started? I'd be so grateful for your support, and thank you for all that you share!

@SaiGaneshNeerumalla-s7o
2 hours ago

@PyLessons data_preprocessors=[ImageReader()],

^^^^^^^^^^^^^

TypeError: ImageReader.__init__() missing 1 required positional argument: 'image_class'
I got this error ,how to solve this .I enable to find

@AyenTorres-we9gp
2 hours ago

how can I use the model to use my camera to scan a handwritten word?

@afrididanial
2 hours ago

Hi Pakistan i need Medical Handwritten doctor prescription dataset for Machine learning in
: Urdu

@enzowu7768
2 hours ago

can I use this to recognize numbers?

@bomxacalaka2033
2 hours ago

what are your specs?

@hollybollyentertainer8097
2 hours ago

Hello👋 can you please attach the links of latest datasets that are available. It would be a great help because i have project deadline within a week😅

@sharkieislive
2 hours ago

hi i am getting this error – Image: Datasets/IAM_Words/words/k04/k04-126/k04-126-02-09.png, Label: with, Prediction: with, CER: 0.0

Image: Datasets/IAM_Words/words/k04/k04-054/k04-054-03-06.png, Label: barbaric, Prediction: barboonic, CER: 0.375

62%|| 5948/9646 [00:12<00:07, 477.91it/s]

Traceback (most recent call last):

File "c:UsersshubhJupyterNBmltu-mainTutorials8_handwriting_recognition_torchinferenceModel.py", line 37, in <module>

cer = get_cer(prediction_text, label)

File "D:applicationanacondaenvstflibsite-packagesmltuutilstext_utils.py", line 79, in get_cer

for pred_tokens, tgt_tokens in zip(preds, target):

TypeError: 'float' object is not iterable

@pasinduminiruwan4990
2 hours ago

Hello Thank you very much for your content. Can I please know that can I use this code foridentify handwritten text in a full page

@nareshmalviya3100
2 hours ago

@PyLessons  when i try to execute fit method
I got error
UnboundLocalError : cannot access local variable 'loss_info' where it is not associated with a value

@rishabh2906
2 hours ago

hey how can I use nougat to make it work more efficiently with maths and other things to any idea?

@ramsuryainnovik
2 hours ago

Hi. Can I get your trained model by any chance?

@aspboss1973
2 hours ago

Great video !
Question – What if we want to extract text from image, (Not hand written) ? Will the same model work ?

@upppvr4280
2 hours ago

I need your help. How we can contact you?

@jahstinarguedas799
2 hours ago

Hello, I tell you that I should try to do the first thing, having the minimum required to start with the code. This is the import of the libraries but I get error after error, did you already have those libraries installed before? Or did you install them for this video?

@ekchills6948
2 hours ago

Thank you so much but please can you tell how I can use my inputs to test it I've already trained with a different dataset

@science3605
2 hours ago

Thank you so much! very well explained. But I'm getting while trying to download dataset, it show error "HTTPerror: Bad Gateway"
Please help me in this regard if possible

@ruckydelmoro2500
2 hours ago

How can i modify the code to process the data once?

@adamofucci4558
2 hours ago

Grazie.

@arifzanko
2 hours ago

ModuleNotFoundError: No module named 'mltu.torch.losses'

I already install mltu==1.0.1, but still didn't work

28
0
Would love your thoughts, please comment.x
()
x