Understanding Light Gradient Boosting Machine

4+
In this article, I will walk you through Light GBM. This is also another variation of gradient boosting and light stands for a lighter version. Which is believed to make the model faster, more efficient, and a bit more accurate. I hope this article will give you clarity about using LGBM in Machine learning.

In life we want everything to be faster and accurate. Similarly, all the software developers want their model to run faster and show results within no time. To make it possible LGBM came into existence. In this article, we are going to learn what is LGBM framework and what are its advantages in Machine learning and Deep learning. And how it has changed technology positively.

In the previous article, we have discussed XGBoost which stands for Extreme gradient boosting. This link will help you to learn the fundamentals of XGBoost. Understanding XGBoost a basic overview. 

Light GBM

Light gradient boosting machine in short LGBM is a framework and a variant of gradient boosting. Like another gradient boosting Light GBM is also based on Decision tree algorithms. With the help of Light GBM, we can reduce memory usage and can increase efficiency.

The main difference between Light GBM and other gradient boosting frameworks is that Light GBM expands in a vertical direction means it grows leaf-wise. While the other algorithms expand horizontally in a level-wise direction. Light GBM selects the leaf which produces the least error and maximum efficiency. This method is way more helpful in reducing the error percentage. In short, it grows leaf-wise while others expand level-wise.

Nowadays, Light GBM has become more popular because with the help of regular algorithms the accuracy is not up to the mark, and for them, it has become quite difficult to produce results fast. Since the data is increasing on daily basis, we need a new model which will be faster and more efficient than came Light GBM into existence. We call it Light because of its high-speed training. Light GBM deals with a large amount of data and consumes only less amount of memory. Developers use Light GBM mostly in hackathons because it provides good efficiency and much faster results and it supports GPU training. It comes in handy to Data scientists. You guys try once. 

When to use Light Gradient Boosting Machine?

We cannot apply Light GBM on small datasets because of its overfitting problem. Light GBM is vulnerable to overfitting and can overfit a small amount of data easily. This Light gradient boosting machine shows good results if the data consists of several rows is more than 10,000. We use this LGBM when one is training with a large amount of data and requires high accuracy.

Implementation

Light GBM is very easy to understand and implement. The most complicated and the most important thing while implementing the Light GBM is parameter tuning. It involves nearly a hundred parameters while implementing. But don’t worry, you don’t need to remember all those parameters. I am here to help you by explaining a few important parameters. With the help of these parameters, Light GBM is one of the most powerful frameworks. Let’s see a few parameters.

Parameters

It is necessary to know about the parameters we are using in our algorithm. There are different types of parameters in Light GBM. Let’s see them.

Control Parameters

  • Max-depth: It indicates the maximum depth of a tree. We use a max-depth parameter to handle the overfitting of a model. If the model is about to overfit or is already overfitted then immediately reduce the value of max-depth.
  • Min-data-in-leaf: It indicates the minimum number of values a leaf can store. The default optimal value to this is 20. Again, we use this to deal with the overfitting of a model.
  • Feature-fraction: We use feature fraction when our boosting is random forest. If the feature fraction value is 0.7 then Light GBM will select 70 percent of parameters randomly.
  • Bagging-fraction: It indicates the fraction of data we use in each iteration and we use it to reduce the overfitting and to improve the speed of the model.
  • Early-stopping-round: This parameter helps us in increasing the speed of the algorithm. The model won’t train any further if there is no improvement in the metric of validation of data in early stopping rounds. Using this we can reduce extra iterations.
  • Lambda: Indicates regularization and is between 0 and 1.
  • Min-gain-to-split: This parameter will indicate the minimum gain we need to make a split. Using this we can control the number of splits in a tree.
  • Max-cat-group: When we have a greater number of categories then splitting might lead to overfitting. To avoid this, Light GBM merges a few of them into max-cat-groups. By merging we can find the splitting point easily.

Core Parameters

  • Task: It indicates the task that we going to perform on the data. The task can be prediction or training.
  • Application: This parameter is the most important one that specifies which model we should apply regression or classification. The default application to this parameter is regression. If the parameter application is: 
    • Binary: We use this for binary classification
    • Multiclass: We use this for multiclass classification
    • Regression: Then we perform regression
  • Boosting: Boosting specifies the algorithm we should use
    • If boosting is rf then we use a random forest algorithm 
    • Goss: Gradient-based one side sampling
    • Gbdt: Traditional gradient boosting decision tree
  • Num-boost-round: It informs about the number of iterations
  • Learning rate: Light GBM starts with an initial expectation that gets updated based on the output of the tree. This learning rate controls the changes in estimations.
  • Num-Leaves: It represents the total number of leaves present in a tree. The default value for this is 31.

Metric Parameter

 Metric: This parameter represents the loss of the model while building. These are a few losses for classification and regression.

  • Mse: Mean square error
  • Mae: Mean absolute error
  • Binary-logloss: Binary classification loss
  • Multi-logloss: Multi classification loss

Finally, we do parameter tuning which is performed by data scientists.

Conclusion

Light GBM is the most useful and a really faster algorithm in Data science. In this article, I gave an overview of LGBM and basic idea of this algortihm.

Thanks for reading!

close
4+

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

DMCA.com Protection Status