Understanding Decision Trees in Machine Learning

The decision tree algorithms make use of different ASMs to divide a node and use the Gini index to formulate the pathway to weigh in the information gain.
Source: Markus Winkler from unsplash


Decision tree algorithms are a type of supervised machine learning algorithm that can be used for both regression and classification. But is mainly considered to be used for classification models. The decision tree is an algorithm that consists of a bunch of if-else constraints that are used to classify and visualize the data accordingly. 

Or to redefine the above statement: decision trees are a type of supervised machine learning algorithm which is used to progressively reduce the entire dataset into small groups based on their descriptive features until the dataset reaches a place where they can be classified into a label.

A decision tree may be considered to be a flowchart like a reducing tree complex which contains several internal nodes representing attributes that can be crucial but the branch present in the decision tree represents a type of decision rule, and the leaf nodes are represented to be a decision outcome.

The decision tree learns to work based on the decisions it makes to make the decisions on a division based idea done on the attribute values. The partitions done by the decision tree are said to be recursive. 

The decision tree is split with the help of recursive partitioning. This flowchart created helps us to analyze along with defining our decision tree. The approach taken by the decision tree is called the top-down approach. Each node present in the tree is a test case for the features present. 

Terminologies Required

1. Root Node: The root node can be known as the top decision node which is present at the top of the decision tree and could be used to represent the entire dataset values, this further gets divided into two subtrees or decision nodes. 

2. Splitting: Splitting: The process of dividing the root node into two or more trees or subtrees is called splitting 

3. Decision Node: A decision tree has several sub-nodes and when they split and form other nodes they are called decision nodes. 

4. Leaf Node: A leaf node is a node with no node found after it’s split. 5. Pruning: The process of removing nodes from a decision tree which is the opposite of splitting is called pruning. 

6. Sub-Tree: A part of the decision tree is called the subtree.

7. Parent Node and Child Node: A node that has different nodes containing values below it is called the parent node and the nodes containing values under it is called the child node. 

How does a decision tree work?

The following steps will give us a good idea of how the decision tree functions:

1. The beginning step is to find out the best attribute to use to split the data, this attribute is known as ASM(Attribute Selection Measures). 

2. The next step is to make the attribute as a decision node and later proceed to break the dataset into smaller subsets. 

3. The final step is to repeat this process recursively for every attribute or child node. 

Attribute Selection Measures

The ASM or attribute selection measures are a way of monitoring the splitting process by choosing the selection criterion so that the data is split in the best way possible. It is also known to determine the splitting rules. It also determines the breakpoints for the splitting. There are several ASM methods which are commonly used such as Information gain, 

Gain Ratio and Gini Index

Information Gain

Information gain is a statistical property that is used to measure how well the present attribute separates the training examples.

Information gain can be applied to estimate which attribute can be used to provide some amount of information based on the classification performed based on a notion of entropy. It does this by calculating the uncertainty and the disorder or put it in simple terms the impurity. So to put things into perspective, we need to decrease the quantity of entropy from the top(root node) to the bottom(last node). 

The variables in the above diagram: 

1. T: This represents the target population before the value is split, hereby becoming the total amount of observations before the split 

2. Entropy(T): This measures the amount of disorder present in the values before the split, i.e the amount the level of uncertainty.

3. S{i}: This is the number of observations made on the ith split. 

4. Entropy(s{i}) : This measures the amount of disorder for a particular variable on split s{i}. 

Gini Index

The Gini index here is also known as the Gini impurity as it is used to calculate the probability of a very particular feature or attribute which is misclassified or is Selected at random. If the value of all the classes can be linked then they will be linked together as a single class and their value will be considered as pure. 

The way Gini calculates the value and acts as the deciding factor for split is simple and can be viewed like this: 

The Gini index tends to vary the value between 0 and 1, here “0” indicates the purity present in the value of the classification, or in simpler words, the elements containing zero belong to a specified class or a single class. The value of “1” here indicates that there is a random distribution of the values present across various classes. The value of 0.5 indicates that there has been an equal distribution of values used for the elements across various classes. 

The value of the Gini index is determined by reducing the sum of squared values of the probability from each of the classes. 

Here Pi denotes the probability of the elements present in various classes. 

Steps to make a simple decision tree: 

1. To begin with, we need to make the necessary imports : 

2. The next step is to implement the decision tree classifier using the sklearn module named “sklearn.tree.DecisionTreeClassifier”. 

3. The last and final step would be to evaluate our decision tree model:


You may also like...

3 Responses

  1. November 23, 2020

    […] a particular company based on its certain features, then many algorithms like Linear Regression and Decision Tree Regressor can be used. But both of these algorithms will make different predictions. Why is it so? One of the […]

  2. December 5, 2020

    […] it is preferred for classification. It is named as a random forest because it combines multiple decision trees to create a “forest” and feed random features to them from the provided dataset. Instead of […]

  3. January 7, 2021

    […] on different distributions of the data set. By default, these base learning algorithms are decision trees in […]


Leave a Reply

Your email address will not be published. Required fields are marked *

DMCA.com Protection Status