Python Income Prediction Model Using Machine Learning

Posted by

Income Prediction Machine Learning Project in Python

In this tutorial, we will build a machine learning model to predict income using Python. We will use a dataset containing various features such as age, education level, occupation, etc., to train a model that can predict income level based on these features.

Step 1: Import Libraries

First, we need to import the necessary libraries in Python. We will be using the pandas library to read and manipulate the data, the scikit-learn library to build and train our machine learning model, and matplotlib library to visualize the results.

<!DOCTYPE html>
<html>
<head>
<title>Income Prediction Machine Learning Project</title>
</head>
<body>

<h1>Income Prediction Machine Learning Project</h1>

<p>In this tutorial, we will build a machine learning model to predict income using Python.</p>

<h2>Step 1: Import Libraries</h2>
<code>
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
</code>

</body>
</html>

Step 2: Load Data

Next, we need to load the dataset we will be using for our project. We will be using the Adult Income dataset from the UCI Machine Learning Repository, which contains information on individuals’ income levels.

<h2>Step 2: Load Data</h2>
<code>
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', header=None)
data.columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num', 'marital_status', 'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income']
</code>

Step 3: Preprocess Data

Before we can train our machine learning model, we need to preprocess the data. This includes handling missing values, encoding categorical variables, and splitting the data into training and testing sets.

<h2>Step 3: Preprocess Data</h2>
<code>
# Handle missing values
data = data.replace('?', np.nan)

# Encode categorical variables
data = pd.get_dummies(data)

# Split data into features and target variable
X = data.drop('income', axis=1)
y = data['income']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
</code>

Step 4: Train Machine Learning Model

Now that we have preprocessed the data, we can train our machine learning model. We will be using a Random Forest Classifier, which is a popular algorithm for classification tasks.

<h2>Step 4: Train Machine Learning Model</h2>
<code>
# Initialize and train the model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
</code>

Step 5: Make Predictions

After training the model, we can make predictions on the test set and evaluate the model’s performance using accuracy score.

<h2>Step 5: Make Predictions</h2>
<code>
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: ", accuracy)
</code>

Step 6: Visualize Results

Lastly, we can visualize the results using a bar chart to show the feature importances in predicting income.

<h2>Step 6: Visualize Results</h2>
<code>
importances = model.feature_importances_
features = X.columns
plt.barh(features, importances)
plt.show()
</code>

By following these steps, you can build a machine learning model to predict income using Python. This project can be extended by trying different algorithms, tuning hyperparameters, and experimenting with different features to improve the model’s performance.

0 0 votes
Article Rating
46 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@qwerty.mnbdudeehfurhfnvur
1 month ago

anything can cause you to be any gender how dare you say income cant cause gender

@qwerty.mnbdudeehfurhfnvur
1 month ago

there we have it ladies and gentleman 2 genders all puns intended

@andyn6053
1 month ago

Why don u just use:

X_train, X_test, y_train, y_test = train_test_split(df, test_size = 0.2)?

@lrgamito
1 month ago

Really awesome. I just wanted to see you actually predicting new inputs, it makes really useful for those who are taking the firsts steps into AI.

@israelakinola-elewode3833
1 month ago

What is the name of these theme?

@fun4allization
1 month ago

Need some help with the "Pandas" section. Inputting the line "pd.get_dummies(df.occupation)" outputs all the values as boolean for me. Any assistance would be greatly appreciated!

@hosseinrezazadeh9011
1 month ago

Thank you.

Really, your content is always excellent, the practical projects that you give as examples clarify the topics

@southafricangamer7174
1 month ago

Hey brother man, is there an alternative to forest.feature_names_in_ section? Either I'm outdated or it doesn't work. Cheers.

@RevistaGiro360
1 month ago

great

@nerualbrain
1 month ago

Thanks for this amazing tutorial

@user-oq7ju6vp7j
1 month ago

is not "?" equals to None (missing value)?

@HamzaKhan-zu9zl
1 month ago

Which OS is he using?

@motishreepatel107
1 month ago

Really useful, this helped me clear many ML concepts. Look forward to be an expert in python like you. It was amazing to see how fast you were writing the python code.

@hpforthewin469
1 month ago

18:23

@NavyTriedCode28
1 month ago

where is the video tutorial on installation?

@hoangha6680
1 month ago

can you tell me the shortcut keys you used to open the new window to install the packages at 3:11?

@963seeker
1 month ago

21:38 just a side note, the reason why gender == 'Male' has a strong correlation is that there are twice as much more males than females in the dataset.That is statistically significant, Therefore we might need to tune the model.

@daqa290885
1 month ago

excellent video bro, very original how to find the values and clean de DF, in y case I use df[column_name].unnique() for this case a get a unique list with the data, but your value_count() is an option too. Thanks for your dedication 👨‍💻👍

@technical9871
1 month ago

Awesome tutorial 👍

@frakfeem
1 month ago

Really good walk through, thanks! One thing I feel like I don't understand about ML is, you say at the end that we have a machine learning model that predicts people's incomes, but what is the next step? How do you actually predict it? I feel like I only ever see this part in videos and I never see what you actually do with this model afterwards, like how is it useful besides drawing some conclusions as you did?