Working with Confusion Matrix in Machine Learning using Python [scikit learn]

Posted by


A confusion matrix is a useful tool for evaluating the performance of a machine learning model. It is a table that allows you to visualize the performance of a classification algorithm. In this tutorial, we will learn how to create a confusion matrix in Python using scikit-learn library.

First, let’s start by importing the necessary libraries:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

Next, let’s create some dummy data for our classification problem:

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 1, 1, 1, 0, 0, 1, 1, 0, 0])

Now, we can create a confusion matrix using scikit-learn:

cm = confusion_matrix(y_true, y_pred)
print(cm)

The output will look like this:

[[2 3]
 [2 3]]

The confusion matrix is a 2×2 matrix where the rows represent the actual classes and the columns represent the predicted classes. In this case, the rows correspond to classes 0 and 1, and the columns also correspond to classes 0 and 1. The diagonal elements represent the number of correct predictions, while the off-diagonal elements represent the number of incorrect predictions.

To visualize the confusion matrix, we can use matplotlib:

plt.matshow(cm, cmap='viridis')
plt.colorbar()
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

This will generate a heatmap that visualizes the confusion matrix. The color intensity represents the number of samples in each cell of the matrix.

In addition to the confusion matrix, we can also calculate other metrics such as precision, recall, and F1-score using scikit-learn:

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))

This will output a detailed report that includes precision, recall, F1-score, and support for each class.

In summary, a confusion matrix is a powerful tool for evaluating the performance of a classification algorithm. By visualizing the performance of the model, we can gain insights into its strengths and weaknesses. Using scikit-learn, we can easily create and analyze a confusion matrix in Python.