Welcome to ML Using Python (MLUP-101) Session 11! In this session, we will be discussing the concept of regularization in machine learning, specifically focusing on the L1 and L2 regularization methods. Regularization is a technique used to prevent overfitting in machine learning models by adding a penalty term to the loss function. This penalty term discourages the model from learning complex patterns that may not generalize well to new, unseen data.
L1 and L2 regularization are two common methods used in machine learning to prevent overfitting. L1 regularization adds a penalty term equal to the absolute values of the model weights, while L2 regularization adds a penalty term equal to the squared values of the model weights. In this tutorial, we will walk through the implementation of L1 and L2 regularization in Python using the scikit-learn library.
To begin, make sure you have the scikit-learn library installed in your Python environment. You can install it using pip by running the following command:
pip install scikit-learn
Next, we will import the necessary libraries and load a sample dataset for demonstration purposes. In this tutorial, we will use the breast cancer dataset from scikit-learn, which is a commonly used dataset for classification tasks. Here’s how you can load the dataset and split it into training and testing sets:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Now that we have our data loaded and split into training and testing sets, we can proceed to implement L1 and L2 regularization in our machine learning model. We will use the LogisticRegression class from scikit-learn, which allows us to specify the penalty term for regularization. Here’s how you can create a logistic regression model with L1 regularization:
from sklearn.linear_model import LogisticRegression
model_l1 = LogisticRegression(penalty='l1', solver='liblinear')
model_l1.fit(X_train, y_train)
Similarly, you can create a logistic regression model with L2 regularization by setting the penalty parameter to ‘l2’:
model_l2 = LogisticRegression(penalty='l2')
model_l2.fit(X_train, y_train)
Once you have trained your models with L1 and L2 regularization, you can evaluate their performance on the test set by calculating the accuracy score. Here’s how you can do this:
from sklearn.metrics import accuracy_score
y_pred_l1 = model_l1.predict(X_test)
accuracy_l1 = accuracy_score(y_test, y_pred_l1)
y_pred_l2 = model_l2.predict(X_test)
accuracy_l2 = accuracy_score(y_test, y_pred_l2)
print('Accuracy with L1 regularization:', accuracy_l1)
print('Accuracy with L2 regularization:', accuracy_l2)
By comparing the accuracies of the models with L1 and L2 regularization, you can determine which method works better for your dataset. Keep in mind that the choice between L1 and L2 regularization often depends on the nature of the problem and the amount of sparsity you want in the model.
In conclusion, L1 and L2 regularization are important techniques in machine learning for preventing overfitting and improving the generalization of models. In this tutorial, we have demonstrated how to implement L1 and L2 regularization in Python using the scikit-learn library. Feel free to experiment with different parameters and datasets to further explore the benefits of regularization in machine learning.
Thank you for joining ML Using Python (MLUP-101) Session 11, and happy modeling!