Machine Learning is a powerful tool that allows computers to learn patterns from data without being explicitly programmed. One of the most popular libraries for Machine Learning in Python is scikit-learn. In this tutorial, we will focus on Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) in scikit-learn.
ROC curves and AUC are used to evaluate the performance of binary classification algorithms. ROC curves plot the true positive rate (sensitivity) against the false positive rate (1-specificity) at various threshold settings, and the AUC measures the area under the ROC curve, which gives an overall measure of the classifier’s performance.
Let’s walk through an example using scikit-learn to understand how to plot ROC curves and calculate AUC.
First, make sure you have scikit-learn installed in your environment. If not, you can install it using pip:
pip install scikit-learn
Now, let’s import the necessary libraries:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, roc_auc_score
Next, let’s generate some synthetic data for our classification task:
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Now, let’s train a Logistic Regression model on the training data and make predictions on the test data:
lr = LogisticRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred)
Now, we can plot the ROC curve using matplotlib:
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label='ROC curve (AUC = %0.2f)' % roc_auc_score(y_test, y_pred))
plt.plot([0, 1], [0, 1], linestyle='--', color='grey')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend()
plt.show()
This code will generate a plot of the ROC curve with the AUC score displayed on the plot. The AUC score gives us a single number that summarizes the performance of the classifier – the higher the AUC score, the better the classifier.
In conclusion, ROC curves and AUC are important tools for evaluating the performance of binary classification algorithms. In this tutorial, we discussed how to use scikit-learn in Python to plot ROC curves and calculate AUC scores. By understanding these concepts and practicing with real data, you’ll be able to evaluate and compare different models effectively in your Machine Learning projects.
thank you so much
I am a noice to Python but you've explained excellently and I understood very well. Tried with my dataset and it came beautifully. Thank you.
Are you able create a tutorial / vedio to generate Nomogram for survival analysis including calibration and discriminate?
hey sir what do decision function do?
Hi great video, What happens if we use a model having more than one threshold parameter? can we still make a ROC? like in multidimensional space? does that make sense?
value error : Multiclass is not supported . This error showing . Please help
What is the equivalent of Decision_Function in Decision Tree, Random Forest and ANN MLP ML Models?
Thank you very much specially for use of Decision_Function for plotting ROC Curve.
If i want to use knn, rf decision tree then what can we use in place of decision function?
Gentleman, thank you so much for this tutorial. Much appreciated 👍👍
Well Explained.👍🏻
How can we save roc curve data as csv file so that I can draw it in origin software
great video. thank you!
How can i plot the threshold over ROC curve so that I can see at what threshold value am getting FP and TP. Please help
Great, Thanks
plz tell how to find that threshold for that specific point by using threshold array
the best thanx
Hello, incase you need assistance for your homework, kindly contact us on essayginger@gmail.com
very helpful, could you please tell me what I need to write for random forest prediction?
A good video with raw codes👍
Can you plz tell me how I can plot ROC curve for multilabel classification to compare various ML models accuracy?