Creating Your First Decision Tree in Python Using scikit-learn

Posted by

How to Build Your First Decision Tree in Python (scikit-learn)

How to Build Your First Decision Tree in Python (scikit-learn)

If you’re new to machine learning and are looking to build your first decision tree in Python using scikit-learn, you’ve come to the right place. Decision trees are a popular and powerful algorithm for both classification and regression tasks, and scikit-learn makes it easy to implement them in Python.

Step 1: Install scikit-learn

The first step is to make sure you have scikit-learn installed in your Python environment. You can do this using pip with the following command:

pip install -U scikit-learn

Step 2: Import the necessary libraries

Once scikit-learn is installed, you can import it along with other necessary libraries in your Python script:


import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

Step 3: Load the Data

For this example, let’s use a simple dataset that contains information about various types of fruits and their attributes. You can load the dataset using pandas:


# Load the dataset
data = pd.read_csv('fruits.csv')

Step 4: Preprocess the Data

Before building the decision tree, you’ll need to preprocess the data by separating the input features from the target variable:


# Separate the input features and the target variable
X = data.drop('fruit_label', axis=1)
y = data['fruit_label']

Step 5: Split the Data

It’s important to split your data into a training set and a testing set to evaluate the performance of the decision tree:


# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 6: Build the Decision Tree

Now it’s time to build the decision tree model using scikit-learn’s DecisionTreeClassifier:


# Create the decision tree model
clf = DecisionTreeClassifier()

# Fit the model to the training data
clf.fit(X_train, y_train)

Step 7: Make Predictions

Once the model is trained, you can use it to make predictions on the testing set:


# Make predictions on the testing set
y_pred = clf.predict(X_test)

Step 8: Evaluate the Model

Finally, you can evaluate the performance of the decision tree model using the accuracy score:


# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

And that’s it! You’ve successfully built and evaluated your first decision tree in Python using scikit-learn. Congratulations!

0 0 votes
Article Rating
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@michaelangelomerza3966
10 months ago

Hi. I'm still learning python and may I ask. How will you add another data on that? For example I want to predict a new player if he will be among the HOF. My input will be only one. Shall I import a new CSV file containing that data then put it on X_test, and y_test? Thank you.

@abdullahal-montasheri5798
10 months ago

can you share the notebook file?