Implementing Multinomial Logistic Regression using Scikit-Learn

Posted by


Multinomial logistic regression is a type of regression analysis used when the dependent variable is categorical with more than two levels. It is an extension of the binary logistic regression model, which is used when the dependent variable has only two levels. In multinomial logistic regression, the dependent variable has more than two categories, and each category is considered as a separate binary logistic regression problem.

In this tutorial, we will be using scikit-learn, which is a popular machine learning library in Python, to implement multinomial logistic regression.

Step 1: Loading the Data
The first step is to load the data that we will be using for the analysis. For this tutorial, we will be using the famous Iris dataset, which contains measurements of various iris flowers along with their species label.

from sklearn import datasets

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

Step 2: Splitting the Data
Next, we need to split the data into training and testing sets. This is important to evaluate the performance of the model.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 3: Training the Model
Now, we can train the multinomial logistic regression model using scikit-learn’s LogisticRegression class with the multi_class='multinomial' parameter.

from sklearn.linear_model import LogisticRegression

# Create an instance of the logistic regression model
model = LogisticRegression(multi_class='multinomial', solver='lbfgs')

# Fit the model on the training data
model.fit(X_train, y_train)

Step 4: Making Predictions
Once the model is trained, we can make predictions on the test data to evaluate its performance.

# Make predictions on the test data
predictions = model.predict(X_test)

Step 5: Evaluating the Model
To evaluate the performance of the model, we can use metrics such as accuracy, precision, recall, and F1 score.

from sklearn.metrics import accuracy_score, classification_report

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, predictions)
print('Accuracy:', accuracy)

# Generate a classification report
print(classification_report(y_test, predictions))

This will give us a detailed report showing the precision, recall, F1 score, and support for each class.

That’s it! You have now implemented multinomial logistic regression using scikit-learn. This tutorial covers the basic steps of loading data, splitting it into training and testing sets, training the model, making predictions, and evaluating the performance. Feel free to experiment with different datasets and parameters to improve the model’s performance.