Generate Confusion Matrix with scikit-learn for Machine Learning in Python

Posted by


In machine learning, a confusion matrix is a useful tool for evaluating the performance of a classification model. It allows you to see how many true positives, true negatives, false positives, and false negatives your model is producing. This information can help you identify areas where your model may be making mistakes and can guide you in making improvements.

In this tutorial, we will use the scikit-learn library in Python to create a confusion matrix for a classification model. Scikit-learn is a popular machine learning library in Python that provides many tools and algorithms for building and evaluating machine learning models.

To get started, make sure you have scikit-learn installed. You can install it using pip by running the following command:

pip install scikit-learn

Once you have scikit-learn installed, you can start by importing the necessary libraries and loading your data. For this tutorial, we will use a simple example dataset included in scikit-learn called the Iris dataset, which contains information about three different species of iris flowers.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import numpy as np

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Next, create and train your classification model. For this tutorial, we will use a simple logistic regression model, but you can use any classification algorithm of your choice.

from sklearn.linear_model import LogisticRegression

# Create and train the model
model = LogisticRegression()
model.fit(X_train, y_train)

Now, use the trained model to make predictions on the test data and create a confusion matrix to evaluate its performance.

# Make predictions on the test data
y_pred = model.predict(X_test)

# Create the confusion matrix
cm = confusion_matrix(y_test, y_pred)

Finally, you can visualize the confusion matrix using a heatmap to make it easier to interpret.

import matplotlib.pyplot as plt
import seaborn as sns

# Create a heatmap of the confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, cmap="Blues", fmt="d", xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

This code will create a heatmap of the confusion matrix, with the true values on the y-axis and the predicted values on the x-axis. The diagonal elements of the matrix represent the correct predictions, while the off-diagonal elements represent the errors made by the model.

By analyzing the confusion matrix, you can identify any patterns or trends in the errors made by your model and make adjustments as needed to improve its performance.

I hope this tutorial was helpful in showing you how to plot a confusion matrix using scikit-learn in Python. If you have any questions or need further assistance, feel free to reach out to me. Happy coding!