Recognizing Handwritten Digits in Python with scikit-learn

Posted by


Handwritten digit recognition is a popular problem in the field of machine learning. The goal is to correctly identify digits that have been written by hand, typically as an image. In this tutorial, we will be using the scikit-learn library in Python to create a handwritten digit recognition model.

  1. Install Required Libraries:
    Before we can start working on the project, we need to make sure that we have scikit-learn installed. If you do not have it installed, you can do so by running the following command in your terminal:
pip install scikit-learn
  1. Import Required Libraries:
    Once scikit-learn is installed, we can start by importing the necessary libraries in our Python script. The three main libraries we will be using are scikit-learn, numpy, and matplotlib.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
import numpy as np
import matplotlib.pyplot as plt
  1. Load the Dataset:
    We will be using the famous MNIST dataset, which consists of 70,000 handwritten digits in grayscale images of size 28×28 pixels. We can load the dataset using the datasets module in scikit-learn.
digits = datasets.load_digits()
  1. Visualize the Dataset:
    Before we start building our model, let’s visualize some of the digits in the dataset to get an idea of what we are working with.
fig, axes = plt.subplots(4, 4, figsize=(8, 8))

for i, ax in enumerate(axes.flat):
    ax.imshow(digits.images[i], cmap='gray')
    ax.axis('off')
    ax.set_title(digits.target[i])

plt.show()
  1. Preprocess the Data:
    Before we can feed the data into our model, we need to preprocess it. We will flatten the 2D images into 1D arrays and normalize the pixel values to be between 0 and 1.
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
data = data / 16.0  # Normalize pixel values
  1. Split the Data:
    Next, we split the data into training and testing sets using the train_test_split function from scikit-learn.
X_train, X_test, y_train, y_test = train_test_split(data, digits.target, test_size=0.2, random_state=42)
  1. Train the Model:
    Now, we can create and train our model. For this tutorial, we will be using a Support Vector Machine (SVM) classifier. We will use the svm.SVC class from scikit-learn.
model = svm.SVC()
model.fit(X_train, y_train)
  1. Evaluate the Model:
    After training the model, we can evaluate its performance on the test set.
accuracy = model.score(X_test, y_test)
print(f"Model accuracy: {accuracy}")
  1. Make Predictions:
    Finally, we can use the trained model to make predictions on new digit images.
predictions = model.predict(X_test[:10])
print(f"Predictions: {predictions}")
  1. Conclusion:
    In this tutorial, we have successfully built a handwritten digit recognition model using scikit-learn in Python. We loaded the MNIST dataset, preprocessed the data, trained the model using an SVM classifier, evaluated its performance, and made predictions. Handwritten digit recognition is a classic machine learning problem with many practical applications, such as optical character recognition and digitized document processing. With the skills learned in this tutorial, you can further explore and improve upon the model by experimenting with different classifiers, hyperparameters, and preprocessing techniques.
0 0 votes
Article Rating
27 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@husseinalsajer4381
2 months ago

please , can you share this code ?

@aakritibector5619
2 months ago

There is error coming at matplotlib.pylot

@sadafmehdi2991
2 months ago

can you share code file?

@joencesimpson
2 months ago

How do I add my own image inside the database?

@sayantanpaul9967
2 months ago

cannot able to import the file from director….as you did like pd.read_csv("datasets/train.csv")…what to do ?? and how to write a file location if it is downloaded in some arbitrary file or folder??

@utsavphuyal5060
2 months ago

Did he say ' hello to this people of cruel world ' …?

@mallamgurudeep1341
2 months ago

AttributeError: 'DataFrame' object has no attribute 'as_matrix'
getting this error at line 6

@codingwithcrystalhill1568
2 months ago

i had to use iloc to get it to work

@snehalmishra5429
2 months ago

I have downloaded the train.csv file in Downloads folder so what path should I specify?

@SurajKumar-bw9oi
2 months ago

Great, Keep it up!

@sarahvaz4326
2 months ago

nice vid! can someone tell me the applications of it ?

@logixquest6710
2 months ago

showing error at xtrain

@skmusic_2022
2 months ago

Is it possible to detect slashes using this technique? Like a Handwritten date?

@sonuiyer
2 months ago

Thank You.
Amazing Video!

@davidlevi4855
2 months ago

as_matrix() on line 6 isn't working for me. What could be the problem

@anaghanil9603
2 months ago

Bro how you giving input

@aer0n1x30
2 months ago

The .as_matrix() part is showing error

@jpaldama9963
2 months ago

Excellent! It worked like a charm. I advise anyone having trouble with .as_matrix() to switch that to .values. .as_matrix() is deprecated.

@ApPillon
2 months ago

I learned something! That's new.

@simon_jakobsson
2 months ago

I can't get this to work – the error message I'm getting is:
AttributeError: 'DataFrame' object has no attribute 'as_matrix'
I haven't been able to find a solution on Google, do you know how to fix it? Thanks!