Recognizing Handwritten Digits in Python with scikit-learn

Posted by


Handwritten digit recognition is a popular problem in the field of machine learning. The goal is to correctly identify digits that have been written by hand, typically as an image. In this tutorial, we will be using the scikit-learn library in Python to create a handwritten digit recognition model.

  1. Install Required Libraries:
    Before we can start working on the project, we need to make sure that we have scikit-learn installed. If you do not have it installed, you can do so by running the following command in your terminal:
pip install scikit-learn
  1. Import Required Libraries:
    Once scikit-learn is installed, we can start by importing the necessary libraries in our Python script. The three main libraries we will be using are scikit-learn, numpy, and matplotlib.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
import numpy as np
import matplotlib.pyplot as plt
  1. Load the Dataset:
    We will be using the famous MNIST dataset, which consists of 70,000 handwritten digits in grayscale images of size 28×28 pixels. We can load the dataset using the datasets module in scikit-learn.
digits = datasets.load_digits()
  1. Visualize the Dataset:
    Before we start building our model, let’s visualize some of the digits in the dataset to get an idea of what we are working with.
fig, axes = plt.subplots(4, 4, figsize=(8, 8))

for i, ax in enumerate(axes.flat):
    ax.imshow(digits.images[i], cmap='gray')
    ax.axis('off')
    ax.set_title(digits.target[i])

plt.show()
  1. Preprocess the Data:
    Before we can feed the data into our model, we need to preprocess it. We will flatten the 2D images into 1D arrays and normalize the pixel values to be between 0 and 1.
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
data = data / 16.0  # Normalize pixel values
  1. Split the Data:
    Next, we split the data into training and testing sets using the train_test_split function from scikit-learn.
X_train, X_test, y_train, y_test = train_test_split(data, digits.target, test_size=0.2, random_state=42)
  1. Train the Model:
    Now, we can create and train our model. For this tutorial, we will be using a Support Vector Machine (SVM) classifier. We will use the svm.SVC class from scikit-learn.
model = svm.SVC()
model.fit(X_train, y_train)
  1. Evaluate the Model:
    After training the model, we can evaluate its performance on the test set.
accuracy = model.score(X_test, y_test)
print(f"Model accuracy: {accuracy}")
  1. Make Predictions:
    Finally, we can use the trained model to make predictions on new digit images.
predictions = model.predict(X_test[:10])
print(f"Predictions: {predictions}")
  1. Conclusion:
    In this tutorial, we have successfully built a handwritten digit recognition model using scikit-learn in Python. We loaded the MNIST dataset, preprocessed the data, trained the model using an SVM classifier, evaluated its performance, and made predictions. Handwritten digit recognition is a classic machine learning problem with many practical applications, such as optical character recognition and digitized document processing. With the skills learned in this tutorial, you can further explore and improve upon the model by experimenting with different classifiers, hyperparameters, and preprocessing techniques.
0 0 votes
Article Rating

Leave a Reply

27 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@husseinalsajer4381
13 hours ago

please , can you share this code ?

@aakritibector5619
13 hours ago

There is error coming at matplotlib.pylot

@sadafmehdi2991
13 hours ago

can you share code file?

@joencesimpson
13 hours ago

How do I add my own image inside the database?

@sayantanpaul9967
13 hours ago

cannot able to import the file from director….as you did like pd.read_csv("datasets/train.csv")…what to do ?? and how to write a file location if it is downloaded in some arbitrary file or folder??

@utsavphuyal5060
13 hours ago

Did he say ' hello to this people of cruel world ' …?

@mallamgurudeep1341
13 hours ago

AttributeError: 'DataFrame' object has no attribute 'as_matrix'
getting this error at line 6

@codingwithcrystalhill1568
13 hours ago

i had to use iloc to get it to work

@snehalmishra5429
13 hours ago

I have downloaded the train.csv file in Downloads folder so what path should I specify?

@SurajKumar-bw9oi
13 hours ago

Great, Keep it up!

@sarahvaz4326
13 hours ago

nice vid! can someone tell me the applications of it ?

@logixquest6710
13 hours ago

showing error at xtrain

@skmusic_2022
13 hours ago

Is it possible to detect slashes using this technique? Like a Handwritten date?

@sonuiyer
13 hours ago

Thank You.
Amazing Video!

@davidlevi4855
13 hours ago

as_matrix() on line 6 isn't working for me. What could be the problem

@anaghanil9603
13 hours ago

Bro how you giving input

@aer0n1x30
13 hours ago

The .as_matrix() part is showing error

@jpaldama9963
13 hours ago

Excellent! It worked like a charm. I advise anyone having trouble with .as_matrix() to switch that to .values. .as_matrix() is deprecated.

@ApPillon
13 hours ago

I learned something! That's new.

@simon_jakobsson
13 hours ago

I can't get this to work – the error message I'm getting is:
AttributeError: 'DataFrame' object has no attribute 'as_matrix'
I haven't been able to find a solution on Google, do you know how to fix it? Thanks!

27
0
Would love your thoughts, please comment.x
()
x