Creating a Decision Tree Model with scikit-learn and Google Colab

Posted by


Decision trees are a popular way to perform decision-making in data science. Decision trees are a type of model that can be used for both classification and regression tasks. They work by breaking down a dataset into smaller and smaller subsets based on certain criteria until a decision can be made about which class a data point belongs to or what the value of a target variable is.

In this tutorial, I will show you how to create a decision tree model using the scikit-learn library in Google Colab, a cloud-based platform that allows you to write and execute Python code in a Jupyter notebook environment.

Step 1: Setting up Google Colab

First, you will need to set up Google Colab by going to https://colab.research.google.com/. If you already have a Google account, you can sign in and start a new Python 3 notebook.

Step 2: Installing scikit-learn

In Google Colab, scikit-learn is already pre-installed, so you do not need to install it separately. However, you can double-check by running the following code in a code cell in the notebook:

!pip show scikit-learn

If scikit-learn is installed, you will see information about the package in the output. If not, you can install it by running the following code in a code cell:

!pip install scikit-learn

Step 3: Importing the necessary libraries

Next, you will need to import the necessary libraries for creating a decision tree model. In a code cell, run the following code:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

Step 4: Loading the dataset

For this tutorial, we will use the famous Iris dataset, which comes pre-installed in the scikit-learn library. To load the dataset, run the following code in a code cell:

from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

Step 5: Splitting the dataset into training and testing sets

Before building a decision tree model, we need to split the dataset into training and testing sets. This can be done using the train_test_split function from scikit-learn. Run the following code in a code cell:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 6: Creating and training the decision tree model

Now that we have split the dataset, we can create a decision tree model and train it on the training set. Run the following code in a code cell:

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

Step 7: Making predictions and evaluating the model

Once the model has been trained, we can make predictions on the testing set and evaluate its performance. Run the following code in a code cell:

y_pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

The accuracy score and confusion matrix will give you an idea of how well the decision tree model is performing on the testing set.

Step 8: Visualizing the decision tree

Finally, you can visualize the decision tree model that was created using the plot_tree function from scikit-learn. Run the following code in a code cell:

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(20,20))
plot_tree(clf, filled=True, feature_names=iris.feature_names, class_names=iris.target_names)
plt.show()

This code will create a plot of the decision tree model, showing the splits and decisions made by the model.

And that’s it! You have now successfully created a decision tree model using scikit-learn in Google Colab. Decision trees are a powerful tool for performing decision-making in data science, and scikit-learn makes it easy to create and train decision tree models. Experiment with different datasets and parameters to further explore the capabilities of decision trees in machine learning.

0 0 votes
Article Rating
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@adityavardhan6606
2 months ago

good good , this code will come in handy in my machine learning lab