Understanding Bagging & Boosting in Machine Learning
Are you preparing for a data science interview? If you are, then this blog will introduce you to one of the most commonly asked interview questions, with a detailed explanation.
You might have heard or read this question very often.
By this time, you would have guessed already. Yes, it is ‘Bagging and Boosting’, the two ensemble methods in machine learning.
This blog will explain ‘Bagging and Boosting’ most simply and shortly. But let us first understand some important terms which are going to be used later in the main content.
Let’s start with an example, If we want to predict ‘sales’ of a particular company based on its certain features, then many algorithms like Linear Regression and Decision Tree Regressor can be used. But both of these algorithms will make different predictions. Why is it so? One of the key factors is how much bias and variance they produce.
Cool, but what if we don’t know anything about Bias and Variance. So let’s jump to Bias and Variance first.
Bias and Variance
Bias: When we train our model on the given data, it makes certain assumptions, some correct and some incorrect. But when the model is introduced with testing or validation data, these assumptions(obviously incorrect ones) may create a difference in predicted value.
So, to conclude from this,” Bias is the difference between the Predicted Value and the Expected Value”.
When there is a high bias error, it results in a model that is not capable of taking so much variation. Since it does not learn enough from the training data, it is called Underfitting.
Variance is the error that occurs when the model captures fluctuations or noises of the data.
To explain further, the model learns too much from the training data, so that when it is introduced with new testing data, it is unable to predict the result accurately.
When there is a high variance error, your model is so specific to the trained data, it is called Overfitting.
Let’s take an example, If you want to purchase a new laptop, then you would browse different shopping websites/apps and check the price, features, and reviews/ratings and then compare them. You would also visit some local stores and also ask your friends or relatives for their opinion. Then at the end, you take a decision considering all these factors.
Ensemble models in machine learning work on a similar idea.
Ensemble methods create multiple models (called base learners/weak learners.) and combine/aggregate them into one final predictive model to decrease the errors (variance or bias). This approach allows us to produce better and more accurate predictive performance compared to a single model.
Ensemble methods can be divided into two groups:
- Parallel ensemble methods: In this method base learners are generated parallelly, hence encouraging independence between the base learners. Due to the application of averages, the overall error is reduced.
- Sequential ensemble methods: In this method base learners are generated sequence try, hence base learners are dependent on each other. Overall performance of the model is then increased by allocating higher weights to previously mislabeled/mispredicted learners.
Boosting and bagging are the two most popularly used ensemble methods in machine learning. Now as we have already discussed prerequisites, let’s jump to this blog’s main content.
Bagging stands for Bootstrap Aggregating or simply Bootstrapping + Aggregating.
- Bootstrapping in Bagging refers to a technique where multiple subsets are derived from the whole (set) using the replacement procedure.
- Aggregation in Bagging refers to a technique that combines all possible outcomes of the prediction and randomizes the outcome.
Hence many weak models are combined to form a better model.
Bagging is a Parallel ensemble method, where every model is constructed independently.
Bagging is used when the aim is to reduce variance.
So, now let’s see how bagging is performed.
How is Bagging performed?
The whole process of Bagging is explained in just a few steps. Please refer to the diagram below for a more clear understanding and visualization.
1. ‘n’ number of data subsets(d1,d2,d3…..dn) are generated randomly with replacement from the original dataset ‘D’; Bootstrapping.
2. Now these multiple sub-datasets are used to train multiple models(which are called ‘Weak Learners’) like m1,m2,m3…..mn.
3. Final prediction(Ensemble model) is given based on the aggregation of predictions from all weak models; Aggregating.
- In the case of Regressor: the average/mean of these predictions is considered as the final prediction.
- In the case of Classifiers: the majority vote gained from the voting mechanism is considered as the final prediction.
As “random sampling with replacement” is used here therefore every element has the same probability to appear in a new training sub-dataset and also some observations may be repeated. Ensemble model produced with these weak models is much more robust and with low variance.
Implementation of Bagging
Random forest algorithm uses the concept of Bagging.
Here is a sample code for how Bagging can be implemented practically. Remember it’s just a sample code, just to introduce you to the required library.
Boosting is a Sequential ensemble method, where each consecutive model attempts to correct the errors of the previous model.
If a base classifier is misclassified in one weak model, its weight will get increased and the next base learner will classify it more correctly.
Since the output of one base learner will be input to another, hence every model is dependent on its previous model.
Boosting is used when the aim is to reduce bias.
So now let’s see how bagging is performed.
How is Boosting performed?
The whole process of Boosting is explained in just a few steps. Please refer to the diagram below for a more clear understanding and visualization.
1. let ‘d1’ be the data-subset, which is generated randomly with replacement from the original dataset ‘D’.
2. Now this subset is used to train the model ‘m1’(which is called a weak learner).
3. This model is then used to make predictions on the original(complete) dataset. Elements or instances which are misclassified/mispredicted by this model, will be given more weights while choosing the next data-subset.
4. Let ‘d2’ be the data-subset, which is generated randomly with replacement from the dataset ‘D'(which is now updated with weights). In this step, instances which have more weights(concluded from the previous step) will be more likely to be chosen.
4. Now this subset is again used to train the model ‘m2’(which is called a weak learner).
5. Above steps are repeated for ‘n’ number of times, to get ‘n’ such models(m1,m2,m3…..mn)
6. Results of these ‘n’ weak models are combined to make a final prediction.
- Gradient boosting
These algorithms use Boosting in a different-different manner which we will see in detail in the next blog.
Here is a sample code for how boosting can be implemented practically on the AdaBoost algorithm. Remember it’s just a sample code, just to introduce you to the required library.
Note: The above techniques are explained generally, not specifically for one or two algorithms. You might see a few differences while implementing these techniques into different machine learning algorithms. But the basic concept or idea remains the same.
Similarities Between Bagging and Boosting
1. Both of them are ensemble methods to get N learners from one learner.
2. Both of them generate several sub-datasets for training by random sampling.
3. Both of them make the final decision by averaging the N learners (or by Majority Voting).
4. Both of them are good at providing higher stability.
Bagging Vs Boosting
1. The Main Goal of Bagging is to decrease variance, not bias. The Main Goal of Boosting is to decrease bias, not variance.
2. In Bagging multiple training data-subsets are drawn randomly with replacement from the original dataset. In Boosting new sub-datasets are drawn randomly with replacement from the weighted(updated) dataset
3. In Bagging, every model is constructed independently. In Boosting, new models are dependent on the performance of the previous model.
4. In Bagging, every model receives an equal weight. In Boosting, models are weighted by their performance.
5. In Bagging, the final prediction is just the normal average. In Boosting, the final prediction is a weighted average.
6. Bagging is usually applied where the classifier is unstable and has a high variance. Boosting is usually applied where the classifier is stable and has a high bias.
7. Bagging is used for connecting predictions of the same type. Boosting is used for connecting predictions that are of different types.
8. Bagging is an ensemble method of type Parallel. Boosting is an ensemble method of type Sequential
You would have expected this blog to explain to you “which is better – Bagging or Boosting?”. So to answer this, there is nothing like a better option among them in general. It depends on the circumstances of the given problem. You need to analyze the error first and then look forward to choosing a better option for your specific problem.
If your problem is under-fitting then boosting can give you better results. In contrast, If your problem is over-fitting then Bagging would be a better choice.
Hence, Bagging and Boosting both are equally important in machine learning.
Thanks for reading!