Part 2: Exploring Hyperparameters in Logistic Regression with Scikit Learn

Posted by

Understanding Hyperparameters in Logistic Regression using Scikit Learn: Part 2

Understanding Hyperparameters in Logistic Regression using Scikit Learn: Part 2

In part 1 of this series, we discussed the basics of logistic regression and how it is used in machine learning. In this article, we will delve deeper into the concept of hyperparameters in logistic regression and how they can be tuned using Scikit Learn in Python.

Hyperparameters in Logistic Regression

Hyperparameters are the parameters that are not learned during training, but are set before the learning process begins. In logistic regression, some of the hyperparameters that can be tuned include the regularization parameter (C), the type of penalty (l1 or l2), and the solver algorithm.

Tuning Hyperparameters using Scikit Learn

Scikit Learn is a powerful machine learning library in Python that provides tools for hyperparameter tuning. Let’s take a look at how we can tune the hyperparameters in logistic regression using Scikit Learn.

Grid Search Cross-Validation

One way to tune the hyperparameters is through grid search cross-validation. This involves defining a grid of hyperparameters and then evaluating the model performance for each combination of hyperparameters. Scikit Learn provides the GridSearchCV class for this purpose.

	
	from sklearn.model_selection import GridSearchCV
	from sklearn.linear_model import LogisticRegression

	# Define the hyperparameters grid
	param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],
	              'penalty': ['l1', 'l2'],
	              'solver': ['liblinear', 'saga']}

	# Create the logistic regression model
	model = LogisticRegression()

	# Perform grid search cross-validation
	grid_search = GridSearchCV(model, param_grid, cv=5)
	grid_search.fit(X_train, y_train)

	# Print the best hyperparameters
	print("Best hyperparameters:", grid_search.best_params_)
	
	

Randomized Search Cross-Validation

Another approach to hyperparameter tuning is through randomized search cross-validation. This involves sampling the hyperparameters randomly from specified distributions. Scikit Learn provides the RandomizedSearchCV class for this purpose.

	
	from sklearn.model_selection import RandomizedSearchCV
	from scipy.stats import uniform, norm

	# Define the hyperparameters distributions
	param_dist = {'C': uniform(loc=0, scale=4),
	              'penalty': ['l1', 'l2'],
	              'solver': ['liblinear', 'saga']}

	# Perform random search cross-validation
	random_search = RandomizedSearchCV(model, param_dist, n_iter=100, cv=5)
	random_search.fit(X_train, y_train)

	# Print the best hyperparameters
	print("Best hyperparameters:", random_search.best_params_)
	
	

Conclusion

Hyperparameter tuning is an important step in building accurate machine learning models. In this article, we discussed the concept of hyperparameters in logistic regression and how they can be tuned using Scikit Learn in Python. By using techniques such as grid search and randomized search cross-validation, we can find the best set of hyperparameters for our logistic regression model.