Recognizing Handwritten Digits with Scikit-Learn: A Project in Machine Learning

Posted by


Handwritten digit recognition is a popular problem in the field of machine learning. In this tutorial, we will use the Scikit-Learn library in Python to build a handwritten digit recognition system. This project can be used as a simple example to understand how machine learning algorithms work and how they can be applied to real-life problems.

To get started with this project, you will need to have a basic understanding of Python programming language, machine learning concepts, and some familiarity with Scikit-Learn library. If you are new to machine learning, it’s recommended to first go through some introductory tutorials on basic concepts such as supervised learning, classification algorithms, and the Scikit-Learn library.

Step 1: Importing Required Libraries

The first step is to import the required libraries for our project. We need Numpy for numerical operations, Matplotlib for plotting graphs, and Scikit-Learn for machine learning algorithms.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

Step 2: Loading and Preprocessing the Dataset

Scikit-Learn provides a built-in dataset called ‘digits’ which contains images of handwritten digits along with their corresponding labels. We will load this dataset and preprocess it before training our model.

digits = datasets.load_digits()

X = digits.data
y = digits.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 3: Training the Machine Learning Model

For this project, we will use a simple logistic regression model to train our algorithm. Logistic regression is a popular classification algorithm that is widely used for problems like this.

model = LogisticRegression()

model.fit(X_train, y_train)

Step 4: Evaluating the Model Performance

Once the model is trained, we need to evaluate its performance on the test dataset. We will calculate the accuracy score of our model to see how well it can recognize handwritten digits.

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Step 5: Making Predictions

Now that our model is trained and evaluated, we can use it to make predictions on new inputs. We can pass a new handwritten digit image to our model and get the predicted label as output.

# Randomly select an image from the test set
random_idx = np.random.randint(0, len(X_test))
input_img = X_test[random_idx].reshape(8, 8)

# Plot the input image
plt.imshow(input_img, cmap='gray')
plt.show()

# Make a prediction on the input image
prediction = model.predict([X_test[random_idx]])
print("Predicted digit:", prediction[0])

Conclusion

In this tutorial, we have built a simple handwritten digit recognition system using Scikit-Learn library in Python. We have successfully trained a logistic regression model to recognize handwritten digits with a good level of accuracy. This project is a great starting point for beginners to get hands-on experience with machine learning algorithms and understand how they can be applied to real-world problems.

0 0 votes
Article Rating
30 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@srinusabbavarapu6509
1 month ago

I want to predict the double digits is it possible sir?, 0-9 not but I want more than 10,11,12 like these.

@himanshubhawnani401
1 month ago

It was very confusing, didn't understood it properly.

@anwaydeepnath5017
1 month ago

Sir which IDE is this???

@kalotheo
1 month ago

Do you know why in get_dataset in line 40 the compiler show this error? TypeError: a bytes-like object is required, not 'str'. I'm using Python 3.8.

@mistershort10
1 month ago

This is totaly outdated. Totaly time waste

@heratpatel6284
1 month ago

How much time it takes to train SVM for MNIST dataset? It is taking too long to run. I tried Google Colab and used it's GPU also. But not sure whether it was getting used or not. Any help would be appreciated. Thank you in advance.

@minkiaggarwall5529
1 month ago

plz give step 3 of svm_starter

@maeeshameem1578
1 month ago

I run the gen_dataset code, and no error showed. but still the dataset is not generated. can plz someone tell the reason?? plz plz :(((((

@mkarthik4768
1 month ago

Hello, how do we run this in jupyter Notebook instead of sublime

@danishtasheikh8341
1 month ago

Is python 3.8.1 compatiale to these libraries version?

@jayeshkulkarni9602
1 month ago

Trying to unpickle estimator SVC from version 0.19.2 when using version 0.20.4. This might lead to breaking code or invalid results…Want to get rid of this error

@forampattha6183
1 month ago

can you please explain why you have to used temp folder ?

@rahulsolankib
1 month ago

Can anyone tell me how to do the same for handwritten digits? I tried above solution but there every pixel value becomes 1 that means it is not able to classify it from image sample

@milanpatel8034
1 month ago

Is the file named "svm_starter.py" used to create the model which is further used to predict/test the model?

@krishanbhadana5308
1 month ago

Thankyou

@swathimparamesh8366
1 month ago

Unable to print the dataframe..what to do?

@harshinisewani800
1 month ago

AttributeError: '_csv.writer' object has no attribute 'writerrows'
How to solve this error ?

@mayanktripathi4u
1 month ago

Hi Sir,
I am using PIL to load the image, and facing issue. Please help.

I am loading the image , and reshaping it to 28*28.. where as when converting it to numpy array at-time it convert into 28*28*3 and at-times into 28*28*4… how to standardize it.
Below is the code.

`from PIL import Image

import numpy as np

size = 28, 28
img = Image.open("handwritten_image_256x256.png")

img
img = img.resize(size, Image.ANTIALIAS)

display(img.size)

img
img_array = np.array(img)

display(img_array.shape, img_array)`

@vinaypalnati8117
1 month ago

after forking the content ,i cannot able to clone to my local repo ,what might be the problem?

@palaksharma4334
1 month ago

how can i download the dataset?