Recognizing Handwritten Digits in Python with scikit-learn

Posted by


Handwritten digit recognition is a popular problem in the field of machine learning. The goal is to correctly identify digits that have been written by hand, typically as an image. In this tutorial, we will be using the scikit-learn library in Python to create a handwritten digit recognition model.

  1. Install Required Libraries:
    Before we can start working on the project, we need to make sure that we have scikit-learn installed. If you do not have it installed, you can do so by running the following command in your terminal:
pip install scikit-learn
  1. Import Required Libraries:
    Once scikit-learn is installed, we can start by importing the necessary libraries in our Python script. The three main libraries we will be using are scikit-learn, numpy, and matplotlib.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
import numpy as np
import matplotlib.pyplot as plt
  1. Load the Dataset:
    We will be using the famous MNIST dataset, which consists of 70,000 handwritten digits in grayscale images of size 28×28 pixels. We can load the dataset using the datasets module in scikit-learn.
digits = datasets.load_digits()
  1. Visualize the Dataset:
    Before we start building our model, let’s visualize some of the digits in the dataset to get an idea of what we are working with.
fig, axes = plt.subplots(4, 4, figsize=(8, 8))

for i, ax in enumerate(axes.flat):
    ax.imshow(digits.images[i], cmap='gray')
    ax.axis('off')
    ax.set_title(digits.target[i])

plt.show()
  1. Preprocess the Data:
    Before we can feed the data into our model, we need to preprocess it. We will flatten the 2D images into 1D arrays and normalize the pixel values to be between 0 and 1.
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
data = data / 16.0  # Normalize pixel values
  1. Split the Data:
    Next, we split the data into training and testing sets using the train_test_split function from scikit-learn.
X_train, X_test, y_train, y_test = train_test_split(data, digits.target, test_size=0.2, random_state=42)
  1. Train the Model:
    Now, we can create and train our model. For this tutorial, we will be using a Support Vector Machine (SVM) classifier. We will use the svm.SVC class from scikit-learn.
model = svm.SVC()
model.fit(X_train, y_train)
  1. Evaluate the Model:
    After training the model, we can evaluate its performance on the test set.
accuracy = model.score(X_test, y_test)
print(f"Model accuracy: {accuracy}")
  1. Make Predictions:
    Finally, we can use the trained model to make predictions on new digit images.
predictions = model.predict(X_test[:10])
print(f"Predictions: {predictions}")
  1. Conclusion:
    In this tutorial, we have successfully built a handwritten digit recognition model using scikit-learn in Python. We loaded the MNIST dataset, preprocessed the data, trained the model using an SVM classifier, evaluated its performance, and made predictions. Handwritten digit recognition is a classic machine learning problem with many practical applications, such as optical character recognition and digitized document processing. With the skills learned in this tutorial, you can further explore and improve upon the model by experimenting with different classifiers, hyperparameters, and preprocessing techniques.
0 0 votes
Article Rating
27 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@husseinalsajer4381
1 month ago

please , can you share this code ?

@aakritibector5619
1 month ago

There is error coming at matplotlib.pylot

@sadafmehdi2991
1 month ago

can you share code file?

@joencesimpson
1 month ago

How do I add my own image inside the database?

@sayantanpaul9967
1 month ago

cannot able to import the file from director….as you did like pd.read_csv("datasets/train.csv")…what to do ?? and how to write a file location if it is downloaded in some arbitrary file or folder??

@utsavphuyal5060
1 month ago

Did he say ' hello to this people of cruel world ' …?

@mallamgurudeep1341
1 month ago

AttributeError: 'DataFrame' object has no attribute 'as_matrix'
getting this error at line 6

@codingwithcrystalhill1568
1 month ago

i had to use iloc to get it to work

@snehalmishra5429
1 month ago

I have downloaded the train.csv file in Downloads folder so what path should I specify?

@SurajKumar-bw9oi
1 month ago

Great, Keep it up!

@sarahvaz4326
1 month ago

nice vid! can someone tell me the applications of it ?

@logixquest6710
1 month ago

showing error at xtrain

@skmusic_2022
1 month ago

Is it possible to detect slashes using this technique? Like a Handwritten date?

@sonuiyer
1 month ago

Thank You.
Amazing Video!

@davidlevi4855
1 month ago

as_matrix() on line 6 isn't working for me. What could be the problem

@anaghanil9603
1 month ago

Bro how you giving input

@aer0n1x30
1 month ago

The .as_matrix() part is showing error

@jpaldama9963
1 month ago

Excellent! It worked like a charm. I advise anyone having trouble with .as_matrix() to switch that to .values. .as_matrix() is deprecated.

@ApPillon
1 month ago

I learned something! That's new.

@simon_jakobsson
1 month ago

I can't get this to work – the error message I'm getting is:
AttributeError: 'DataFrame' object has no attribute 'as_matrix'
I haven't been able to find a solution on Google, do you know how to fix it? Thanks!