Learning Logistic Regression and ROC Curve with Scikit-Learn in ML10

Posted by

ML10: Supervised Learning With Scikit-Learn – Logistic regression and the ROC curve

ML10: Supervised Learning With Scikit-Learn – Logistic regression and the ROC curve

In this module, we will be discussing supervised learning using logistic regression with Scikit-Learn. Logistic regression is a popular classification algorithm that is often used to predict binary outcomes.

Logistic Regression

Logistic regression is a statistical model that is used to predict the probability of a binary outcome. It uses a logistic function to estimate the probability that a given input belongs to a particular class. In this module, we will learn how to implement logistic regression using the Scikit-Learn library in Python.

ROC Curve

The ROC curve, short for Receiver Operating Characteristic curve, is a graphical representation of the performance of a binary classification model. It plots the true positive rate (sensitivity) against the false positive rate (1 – specificity) at different threshold values. The area under the ROC curve (AUC) is a common metric used to evaluate the performance of a classification model.

Implementing Logistic Regression with Scikit-Learn

To implement logistic regression with Scikit-Learn, you will first need to import the necessary libraries:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

Next, you can load your dataset and split it into training and testing sets using the train_test_split function:

# Load the dataset
data = pd.read_csv('your_dataset.csv')

# Split the data into features and target variable
X = data.drop('target', axis=1)
y = data['target']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Finally, you can fit a logistic regression model to your training data and generate the ROC curve:

# Fit a logistic regression model
lr = LogisticRegression()
lr.fit(X_train, y_train)

# Generate predicted probabilities for the test data
y_pred_prob = lr.predict_proba(X_test)[:,1]

# Generate the ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
roc_auc = auc(fpr, tpr)

# Plot the ROC curve
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic curve')
plt.legend(loc="lower right")
plt.show()

By following these steps, you will be able to implement logistic regression with Scikit-Learn and visualize the performance of your model using the ROC curve.