Understanding Hyperparameters in Logistic Regression using Scikit Learn: Part 2
In part 1 of this series, we discussed the basics of logistic regression and how it is used in machine learning. In this article, we will delve deeper into the concept of hyperparameters in logistic regression and how they can be tuned using Scikit Learn in Python.
Hyperparameters in Logistic Regression
Hyperparameters are the parameters that are not learned during training, but are set before the learning process begins. In logistic regression, some of the hyperparameters that can be tuned include the regularization parameter (C), the type of penalty (l1 or l2), and the solver algorithm.
Tuning Hyperparameters using Scikit Learn
Scikit Learn is a powerful machine learning library in Python that provides tools for hyperparameter tuning. Let’s take a look at how we can tune the hyperparameters in logistic regression using Scikit Learn.
Grid Search Cross-Validation
One way to tune the hyperparameters is through grid search cross-validation. This involves defining a grid of hyperparameters and then evaluating the model performance for each combination of hyperparameters. Scikit Learn provides the GridSearchCV class for this purpose.
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
# Define the hyperparameters grid
param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],
'penalty': ['l1', 'l2'],
'solver': ['liblinear', 'saga']}
# Create the logistic regression model
model = LogisticRegression()
# Perform grid search cross-validation
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)
# Print the best hyperparameters
print("Best hyperparameters:", grid_search.best_params_)
Randomized Search Cross-Validation
Another approach to hyperparameter tuning is through randomized search cross-validation. This involves sampling the hyperparameters randomly from specified distributions. Scikit Learn provides the RandomizedSearchCV class for this purpose.
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, norm
# Define the hyperparameters distributions
param_dist = {'C': uniform(loc=0, scale=4),
'penalty': ['l1', 'l2'],
'solver': ['liblinear', 'saga']}
# Perform random search cross-validation
random_search = RandomizedSearchCV(model, param_dist, n_iter=100, cv=5)
random_search.fit(X_train, y_train)
# Print the best hyperparameters
print("Best hyperparameters:", random_search.best_params_)
Conclusion
Hyperparameter tuning is an important step in building accurate machine learning models. In this article, we discussed the concept of hyperparameters in logistic regression and how they can be tuned using Scikit Learn in Python. By using techniques such as grid search and randomized search cross-validation, we can find the best set of hyperparameters for our logistic regression model.