Tutorial 31: Introduction to Logistic Regression Machine Learning Method with Scikit Learn and Pandas in Python

Posted by


In this tutorial, we will cover how to use the Logistic Regression machine learning method in Python using Scikit Learn and Pandas. Logistic Regression is a classification algorithm used to predict the probability of a binary outcome based on one or more independent variables.

We will use the iris dataset for this tutorial, which is a commonly used dataset in machine learning. The iris dataset contains 150 samples of iris flowers, each with four features: sepal length, sepal width, petal length, and petal width. The target variable is the species of the iris flower, which can be one of three classes: setosa, versicolor, or virginica.

Let’s get started by importing the necessary libraries:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

Next, we will load the iris dataset into a Pandas DataFrame and split it into features and target variables:

iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None)
X = iris.iloc[:, :-1]
y = iris.iloc[:, -1]

Now, we will split the data into training and testing sets using the train_test_split function from Scikit Learn:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Next, we will create an instance of the Logistic Regression model and fit it to the training data:

model = LogisticRegression()
model.fit(X_train, y_train)

Now that our model is trained, we can make predictions on the test set and evaluate its performance using the classification_report and confusion_matrix functions:

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

The classification_report function displays precision, recall, F1-score, and support for each class in the target variable. The confusion_matrix function shows the number of true positives, true negatives, false positives, and false negatives for each class.

That’s it! You have successfully implemented the Logistic Regression algorithm using Scikit Learn and Pandas in Python. Logistic Regression is a powerful algorithm for binary classification tasks and is widely used in machine learning applications. Feel free to experiment with different datasets and hyperparameters to improve the performance of your model.

0 0 votes
Article Rating

Leave a Reply

18 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@user-pw9wz2fo3o
25 days ago

BTW its spelt as logistic not logestic.

@danielngigi2608
25 days ago

Great video 👍

@haoyuan92
25 days ago

hello, to perform logistic regression, can my predictor variable be binary (0 or 1) or categorical (1,2,3,etc.) ?

@zoombiz3365
25 days ago

Sir can we predict the gender
If so what would be the changes that we have to make, like how to solve the value error
Anyone please help

@Majaroshimi
25 days ago

Thank you very much sir!!

@Drkalaamarab
25 days ago

Great share 👍, I will work on this data. What is the best way to contact you?

@7singar7
25 days ago

how to i find the best hyperparameters for logistic regression in python ?

@nehasheth3680
25 days ago

Sir, could you please share the link of the customer dataset?

@amioza73
25 days ago

can i run this directly on python

@kirank1923
25 days ago

hi how tune hyper parameters

@naveenkumargandla5386
25 days ago

why didn't you split the data into train and test . whatever the error metrics you checked that is for the data which was used in the model.

@danielsloan20
25 days ago

Amazing video thanks!!!

@electrology
25 days ago

I am unable to get anything working in the data set that I am using to build the logistic regression model after the step "Deploying and evaluating your model". I get the following error.

—————————————————————————
NotFittedError Traceback (most recent call last)
<ipython-input-14-e080ceade2d3> in <module>()
—-> 1 y_pred = LogReg.predict(X)
2 from sklearn.metrics import classification_report
3 print(classification_report(Y, y_pred))

~Anaconda3libsite-packagessklearnlinear_modelbase.py in predict(self, X)
322 Predicted class label per sample.
323 """
–> 324 scores = self.decision_function(X)
325 if len(scores.shape) == 1:
326 indices = (scores > 0).astype(np.int)

~Anaconda3libsite-packagessklearnlinear_modelbase.py in decision_function(self, X)
296 if not hasattr(self, 'coef_') or self.coef_ is None:
297 raise NotFittedError("This %(name)s instance is not fitted "
–> 298 "yet" % {'name': type(self).__name__})
299
300 X = check_array(X, accept_sparse='csr')

NotFittedError: This LogisticRegression instance is not fitted yet

​Not sure what is going on. I imported all the necessary dependencies etc. and followed the instructions step by step but this is not working. In my data set, the X (independent variables) are in float64 formats. The Y (binary dependent variable) is in int64 format. Is there anything going wrong with the formats? CAN YOU PLEASE HELP!

@NadyaPena-01
25 days ago

Thank you for this video. Very helpful and clear. However, please share the files via Github or some other platform that's not 4Shared if possible. 4Shared is pretty inconvenient because they won't let me download the data unless I sign up with them and even after I did that, they made me wait on their page to download (navigating away from the page also stopped their download timer). Totally put off by that.

@joedandantech
25 days ago

Can you put your files public on GitHub? no one likes to download from links…

@datascienceds7965
25 days ago

Thanks for the video. It was very well explained. How can we plot the values precession, recall and support?

@battlemoose1594
25 days ago

Great video, explains exactly how to do it. Doesn't discuss any of the theory though.

@ankitadwivedi1183
25 days ago

Hello sir,

In classification report, I am getting 0.00 for precision, recall, f1score for true values (row as 1).
Please help me in finding where am I going wrong.

18
0
Would love your thoughts, please comment.x
()
x