Ensemble learning is a machine learning technique that combines the predictions of multiple machine learning algorithms to produce a better predictive model than any individual model. Bootstrap aggregating (Bagging) and boosting are two popular methods of implementing ensemble learning in machine learning models. In this tutorial, we will explore how to use these techniques using Scikit Learn, a powerful machine learning library in Python.
- Bagging:
Bagging is a technique in ensemble learning where multiple models are trained independently on different subsets of the training data and their predictions are combined through a voting mechanism. This helps to reduce overfitting and improve the overall performance of the model.
To implement bagging in Scikit Learn, we can use the BaggingClassifier class. Here’s a simple example of how to use bagging with a decision tree classifier:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
# Load the dataset
X, y = load_dataset()
# Create a decision tree classifier
base_classifier = DecisionTreeClassifier()
# Create a bagging classifier
bagging_classifier = BaggingClassifier(base_estimator=base_classifier, n_estimators=10)
# Train the bagging classifier
bagging_classifier.fit(X, y)
# Make predictions
predictions = bagging_classifier.predict(X_test)
In this example, we first load the dataset and create a decision tree classifier as the base estimator. We then create a BaggingClassifier object with the base classifier and the number of estimators (in this case, 10). Finally, we train the bagging classifier on the training data and make predictions on the test data.
- Boosting:
Boosting is another popular ensemble learning technique where multiple weak learners are combined to create a strong learner. In boosting, each model is trained sequentially, and the weight of misclassified instances is increased to focus on difficult examples.
To implement boosting in Scikit Learn, we can use the AdaBoostClassifier class. Here’s an example showcasing how to use boosting with a decision tree classifier:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
# Load the dataset
X, y = load_dataset()
# Create a decision tree classifier
base_classifier = DecisionTreeClassifier()
# Create a boosting classifier
boosting_classifier = AdaBoostClassifier(base_estimator=base_classifier, n_estimators=50, learning_rate=0.1)
# Train the boosting classifier
boosting_classifier.fit(X, y)
# Make predictions
predictions = boosting_classifier.predict(X_test)
In this example, we first load the dataset and create a decision tree classifier as the base estimator. We then create an AdaBoostClassifier object with the base classifier, the number of estimators (50), and the learning rate (0.1). Finally, we train the boosting classifier on the training data and make predictions on the test data.
Overall, bagging and boosting are powerful techniques in ensemble learning that can significantly improve the performance of machine learning models. By implementing these techniques using Scikit Learn, you can create more accurate and robust predictive models for various machine learning tasks.
Your boosting implementation is overfitting because it is supposed to use weak-learners such as "stumps" (DecisionTree with max_depth=1). However since you haven't specified any max_depth to the base DecisionTree estimators of Adaboost, it is growing each base tree fully and hence the overfitting is expected.
Here again ! Today i subscribe.
You deserve it sir ❤️👍👍👍
thank you so much
I have a problem with this https://www.kaggle.com/paresh2047/uci-semcom
Do you mind to show me how to settle this problem sir?
Thank you 🙏😊🤡
Now I am clear 😁😁 thanks for the explanation 👍
No over-explanation and a lot of useful information and to the point. Zero frustration and no beating around the bush. You rock!
Very helpful thanks a lot
Thanks a lot !! very informative video on Ensemble learning just at one go ..keep posting !!
Gr8 Job man..Keep it up
6 weeks of lectures summarised in a way better manner just under 15 minutes (including the previous video) Great job man, thanks!
Thanks much for all your help, but I did not get the logic of voting classifier what you interpreted it.
This guy rocks! a fast but well organized demonstration of ensemble learning
Is possible to make a voting engine with neuronal networks and for example SVM?
In bagging how to use multiple models as in this example you used only decision tree (like I want to implement SVM, KNN and decision tree in different bags and then combine the result)
Excellent!
mnist.csv file sends me, please.
I am a Data Scientist at Citi in Texas technology headquarters. Your videos have truly helped me to understand the very basics. I wish professors at big universities used these dynamic drawn pictures like you. Grateful for your help.
Awesome work man!
Helpful video! Do you know whether hard voting (based on the predicted labels of the learners in the ensemble) or soft voting (based on the predicted probabilities of the learners in the ensemble) is the default majority vote settin in the GradientBoosting Classifier of Sklearn?