Feature importance is a crucial concept in machine learning, as it helps to identify which features or variables have the most impact on the target variable. In decision tree models, feature importance is determined by the contribution of each feature to the overall prediction accuracy of the model. By understanding feature importance, you can gain insights into the relevant variables and make more informed decisions when tuning your model.
In this tutorial, we will cover how to calculate and visualize feature importance in decision tree models using Sklearn, a popular machine learning library in Python.
- Installing Required Libraries
Before we start, make sure you have Sklearn and other required libraries installed. You can install Sklearn using the following command:
pip install -U scikit-learn
- Loading Dataset
For this tutorial, we will use the famous Iris dataset, which is included in the Sklearn library. You can load the dataset as follows:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
- Building a Decision Tree Model
Next, we will build a decision tree classifier using the Sklearn library:
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier()
dt.fit(X, y)
- Calculating Feature Importance
Now, we can calculate the feature importance using the feature_importances_
attribute of the decision tree classifier:
importances = dt.feature_importances_
The importances
variable now contains the importance of each feature in the dataset. You can print the feature importances as follows:
for i, importance in enumerate(importances):
print(f'Feature {i}: {importance}')
- Visualizing Feature Importance
To visualize the feature importance, we can plot a bar chart using the Matplotlib library:
import matplotlib.pyplot as plt
features = range(X.shape[1])
plt.bar(features, importances)
plt.xlabel('Feature')
plt.ylabel('Importance')
plt.title('Feature Importance in Decision Tree')
plt.xticks(features)
plt.show()
The bar chart will show the importance of each feature in the dataset. Features with higher importance values have a greater impact on the model’s prediction accuracy.
- Interpreting Feature Importance
By looking at the feature importance values, you can gain insights into the most relevant features in your dataset. You can use this information to:
- Identify key features that have a significant impact on the target variable.
- Select important features for model training, which can improve model accuracy and reduce overfitting.
- Understand the relationships between features and target variables in your dataset.
In conclusion, feature importance is a valuable tool in machine learning that can help you understand the underlying relationships in your data and make informed decisions when building and tuning your models. By following this tutorial, you can easily calculate and visualize feature importance in decision tree models using Sklearn in Python.
THQ SIR
Gracias,,,,
Thaaaaank you !!
how to find the "GLOBAL FEATURE IMPORTANCE"? Thanks , nice vid!!!!
Hi, thanks you for the video, I have a question, do you know how to find the importance features by each one categories of the input variable?
after getting output of my features value there is 0.0 all variables and only one is 1.0
but when i arrange them according to ascending and descending order they change their place in both order
i want to view (9/10) decimal places after (.) point but how?………………..plz help
feature = pd.DataFrame({'Features':x.columns,'Importance':dt.feature_importances_})
feature
What is library u have used for feature selection