Learning Machine Learning with Scikit-Learn Python: Understanding ROC and AUC

Posted by


Machine Learning is a powerful tool that allows computers to learn patterns from data without being explicitly programmed. One of the most popular libraries for Machine Learning in Python is scikit-learn. In this tutorial, we will focus on Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) in scikit-learn.

ROC curves and AUC are used to evaluate the performance of binary classification algorithms. ROC curves plot the true positive rate (sensitivity) against the false positive rate (1-specificity) at various threshold settings, and the AUC measures the area under the ROC curve, which gives an overall measure of the classifier’s performance.

Let’s walk through an example using scikit-learn to understand how to plot ROC curves and calculate AUC.

First, make sure you have scikit-learn installed in your environment. If not, you can install it using pip:

pip install scikit-learn

Now, let’s import the necessary libraries:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, roc_auc_score

Next, let’s generate some synthetic data for our classification task:

X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Now, let’s train a Logistic Regression model on the training data and make predictions on the test data:

lr = LogisticRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred)

Now, we can plot the ROC curve using matplotlib:

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label='ROC curve (AUC = %0.2f)' % roc_auc_score(y_test, y_pred))
plt.plot([0, 1], [0, 1], linestyle='--', color='grey')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend()
plt.show()

This code will generate a plot of the ROC curve with the AUC score displayed on the plot. The AUC score gives us a single number that summarizes the performance of the classifier – the higher the AUC score, the better the classifier.

In conclusion, ROC curves and AUC are important tools for evaluating the performance of binary classification algorithms. In this tutorial, we discussed how to use scikit-learn in Python to plot ROC curves and calculate AUC scores. By understanding these concepts and practicing with real data, you’ll be able to evaluate and compare different models effectively in your Machine Learning projects.

0 0 votes
Article Rating
36 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@N3xlow
1 month ago

thank you so much

@kandiahchandrakumaran8521
1 month ago

I am a noice to Python but you've explained excellently and I understood very well. Tried with my dataset and it came beautifully. Thank you.
Are you able create a tutorial / vedio to generate Nomogram for survival analysis including calibration and discriminate?

@imshobhitjangra
1 month ago

hey sir what do decision function do?

@wissenistmacht6685
1 month ago

Hi great video, What happens if we use a model having more than one threshold parameter? can we still make a ROC? like in multidimensional space? does that make sense?

@dipkumarsaha6332
1 month ago

value error : Multiclass is not supported . This error showing . Please help

@sushanshrestha8577
1 month ago

What is the equivalent of Decision_Function in Decision Tree, Random Forest and ANN MLP ML Models?

@sushanshrestha8577
1 month ago

Thank you very much specially for use of Decision_Function for plotting ROC Curve.

@rahulsarkar1273
1 month ago

If i want to use knn, rf decision tree then what can we use in place of decision function?

@alizaidi488
1 month ago

Gentleman, thank you so much for this tutorial. Much appreciated 👍👍

@anmol_seth_xx
1 month ago

Well Explained.👍🏻

@bring-it-on
1 month ago

How can we save roc curve data as csv file so that I can draw it in origin software

@PythonArms
1 month ago

great video. thank you!

@nehalverma8063
1 month ago

How can i plot the threshold over ROC curve so that I can see at what threshold value am getting FP and TP. Please help

@amirrezamousavi5139
1 month ago

Great, Thanks

@ADESHKUMAR-yz2el
1 month ago

plz tell how to find that threshold for that specific point by using threshold array

@HamzaHamza-dg9hi
1 month ago

the best thanx

@essayginger4381
1 month ago

Hello, incase you need assistance for your homework, kindly contact us on essayginger@gmail.com

@moumitamoitra1829
1 month ago

very helpful, could you please tell me what I need to write for random forest prediction?

@sheikhshah2593
1 month ago

A good video with raw codes👍

@shibbirahmed7929
1 month ago

Can you plz tell me how I can plot ROC curve for multilabel classification to compare various ML models accuracy?