Handwritten digit recognition is a popular problem in the field of machine learning. The goal is to correctly identify digits that have been written by hand, typically as an image. In this tutorial, we will be using the scikit-learn library in Python to create a handwritten digit recognition model.
- Install Required Libraries:
Before we can start working on the project, we need to make sure that we have scikit-learn installed. If you do not have it installed, you can do so by running the following command in your terminal:
pip install scikit-learn
- Import Required Libraries:
Once scikit-learn is installed, we can start by importing the necessary libraries in our Python script. The three main libraries we will be using are scikit-learn, numpy, and matplotlib.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
import numpy as np
import matplotlib.pyplot as plt
- Load the Dataset:
We will be using the famous MNIST dataset, which consists of 70,000 handwritten digits in grayscale images of size 28×28 pixels. We can load the dataset using thedatasets
module in scikit-learn.
digits = datasets.load_digits()
- Visualize the Dataset:
Before we start building our model, let’s visualize some of the digits in the dataset to get an idea of what we are working with.
fig, axes = plt.subplots(4, 4, figsize=(8, 8))
for i, ax in enumerate(axes.flat):
ax.imshow(digits.images[i], cmap='gray')
ax.axis('off')
ax.set_title(digits.target[i])
plt.show()
- Preprocess the Data:
Before we can feed the data into our model, we need to preprocess it. We will flatten the 2D images into 1D arrays and normalize the pixel values to be between 0 and 1.
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
data = data / 16.0 # Normalize pixel values
- Split the Data:
Next, we split the data into training and testing sets using thetrain_test_split
function from scikit-learn.
X_train, X_test, y_train, y_test = train_test_split(data, digits.target, test_size=0.2, random_state=42)
- Train the Model:
Now, we can create and train our model. For this tutorial, we will be using a Support Vector Machine (SVM) classifier. We will use thesvm.SVC
class from scikit-learn.
model = svm.SVC()
model.fit(X_train, y_train)
- Evaluate the Model:
After training the model, we can evaluate its performance on the test set.
accuracy = model.score(X_test, y_test)
print(f"Model accuracy: {accuracy}")
- Make Predictions:
Finally, we can use the trained model to make predictions on new digit images.
predictions = model.predict(X_test[:10])
print(f"Predictions: {predictions}")
- Conclusion:
In this tutorial, we have successfully built a handwritten digit recognition model using scikit-learn in Python. We loaded the MNIST dataset, preprocessed the data, trained the model using an SVM classifier, evaluated its performance, and made predictions. Handwritten digit recognition is a classic machine learning problem with many practical applications, such as optical character recognition and digitized document processing. With the skills learned in this tutorial, you can further explore and improve upon the model by experimenting with different classifiers, hyperparameters, and preprocessing techniques.
please , can you share this code ?
There is error coming at matplotlib.pylot
can you share code file?
How do I add my own image inside the database?
cannot able to import the file from director….as you did like pd.read_csv("datasets/train.csv")…what to do ?? and how to write a file location if it is downloaded in some arbitrary file or folder??
Did he say ' hello to this people of cruel world ' …?
AttributeError: 'DataFrame' object has no attribute 'as_matrix'
getting this error at line 6
i had to use iloc to get it to work
I have downloaded the train.csv file in Downloads folder so what path should I specify?
Great, Keep it up!
nice vid! can someone tell me the applications of it ?
showing error at xtrain
Is it possible to detect slashes using this technique? Like a Handwritten date?
Thank You.
Amazing Video!
as_matrix() on line 6 isn't working for me. What could be the problem
Bro how you giving input
The .as_matrix() part is showing error
Excellent! It worked like a charm. I advise anyone having trouble with .as_matrix() to switch that to .values. .as_matrix() is deprecated.
I learned something! That's new.
I can't get this to work – the error message I'm getting is:
AttributeError: 'DataFrame' object has no attribute 'as_matrix'
I haven't been able to find a solution on Google, do you know how to fix it? Thanks!