Classification Algorithms Explained in 30 Minutes
A Quick Review Guide for Classification in Machine Learning, Along with Some of the Most Used Classification Algorithm, All Explained in Under 30 Minutes.
In the Machine Learning terminology, the process of Classification can be defined as a supervised learning algorithm that aims at categorizing a set of data into different classes. In other words, if we think of a dataset as a set of data instances, and each data instance as a set of features, then Classification is the process of predicting the particular class that that individual data instance might belong to, based on its features.
Unlike regression where the target variable (i.e., the predicted value) belongs to a continuous distribution, in case of classification, the target variable is discrete. It can only be one of the various target classes in a given problem.
For example, let’s say you are working on a cat-dog-classifier model that predicts whether the animal in a given image is a cat or a dog. Now, the target classes for this problem, which you can probably already see, will be ‘cat’ and ‘dog’. Hence for every data instance (an image in this case), the prediction can only belong to one of the 2 classes.
Now, classification problems can be further into 2 major categories-
- Binary Classification – In binary classification, the target variable has two possible outcomes. For example, the cat-and-dog classifier that we discussed above falls under the category of binary classification, as there are only two possible predicted outcomes in that problem.
- Multi-Label Classification – As the name suggests, in case of multi-label classification, the prediction for a data instance can be more than 2 possible discrete outcomes. For example, when predicting the quality of a wine on an integer scale of 1-10, a given data instance representing a wine sample can belong to any one of the 10 classes.
Now that we know what Classification is, let us have a look at some of the most used and popular classification algorithms in Machine Learning. We will understand each of these classification algorithms one by one, looking at their strength and weaknesses, as well as their implementation using Python and the popular PyData library Scikit-Learn.
So, let’s get started.
While the “regression” in the name might raise some ambiguity about the algorithm belonging to the category of regression algorithms, Logistic Regression is one of the most popular classification-based machine learning algorithms. Logistic regression can be used for both binary (binary logistic regression) as well as multiclass classification (multinomial or softmax regression) problems. Logistic regression assumes a linearity between the target (dependent variable) and the features (independent variable) of the data instance. It used a combination of a linear function (z = Wx + B) and a normalization function (softmax function; ŷ = σ(z)) to predict the probabilities for the target variable to belong to each possible outcome category.
The following are the advantages and disadvantages of logistic regression-
- Logistic regression works well when the data is linearly separable, i.e., if all the data instances are plotted on a scatter plot, there must be a line that divides the data in such a way such that data instances belonging to the same class end up together on the same side of the line.
- Logistic regression is somewhat less prone to over-fitting, especially when the dimensionality of the data is low.
- It is one of the most easy-to-implement classification algorithms.
- As the dimensionality of data increases, i.e., the number of features used to train the classifier increases, the chances of over-fitting increases.
- Logistic regression assumes that the data that is being used is linearly separable. In real-world problems however, the data might contain a lot of outliers. Thus, logistic regression is very sensitive to outliers, and is overall a bad choice as a classification algorithm if your data is not linearly separable.
Now that we know the advantages and disadvantages of logistic regression, we have a rough idea regarding when and when not to use logistic regression for solving classification problems. Let us now see how we can implement a logistic regression classifier in Python.
K-Nearest Neighbors (KNN)
The KNN algorithm is very flexible in the sense that it can be used as both a regression as well as a classification algorithm. However, we are more interested in the classifier variant of the algorithm. KNN uses what is known as ‘feature similarity’ in order to predict the label class of a given data instance. This is a non-parametric method, which means that a KNN model doesn’t have to train and optimize a set of weights and biases to be able to make predictions. Rather, the model simply stores the feature values for all the training data instances. During the inference phase, the model compares the new data instance with ‘K’ training data instances that are nearest to this new point. This new data point is then assigned the class that has the highest frequency out of the K neighboring training data points.
The following are the advantages and disadvantages of the KNN classification algorithm.
- Since KNN is a non-parametric algorithm, it doesn’t require several training cycles in order to optimize its parameters. As a result, it has one of the shortest training times. This makes it an ideal choice for batch training systems, where we have to train a new model each time in order to update it for the latest data.
- It is very easy to implement and understand.
- KNN algorithm works for both binary as well as multi-class classification problems.
- The inference can be compute-intensive. This is because in order to find the k-nearest training data points for the given new data, the distance from each training instance has to be calculated every single time.
- The accuracy is inversely proportional to the dimensionality of the dataset. The higher the dimensionality, the more are the chances that the accuracy of the model will decrease.
- In order to get the optimal performance, you need to find the best value for the hyperparameter ‘k’ that determines the number of neighbors to consider in order make the predictions. Hyperparameter optimization (i.e., choosing a set of best hyperparameter values for which the model gives an optimum performance) can be a tricky task and requires a lot of experimentation.
The following is the implementation of KNN classifier in Python.
Decision Trees Classifier
A decision tree classification algorithm, as the name suggests, represents a tree-like data structure. The algorithm uses recursive splitting of the data (according to some decision-making rules) in order to predict the outcome class for a given data point. Each node within the tree represents a decision rule. The internal branches represent the splits based on the decision rule. The leaves of the tree represent the outcome of the splitting. Once a new data point is fed to the model for inference, the tree predicts its target class based on the decision rules of the internal nodes.
The following are the advantages and disadvantages of the decision tree classifier.
- Unlike some other classifiers that use some complex mathematical formulae to make the predictions, the decision tree classifiers are easy to understand as the rules used for making the splits are not at all complex.
- Also, you can view the tree, including the decision rules at each node using various visualization methods.
- Decision tree classifiers are highly prone to overfitting, as a result of which on an average it is noticed that decision trees have an overall lower prediction accuracy.
- Decision trees do not work with non-numerical data directly. So, the non-numerical features need to be converted into a numerical form.
- As the number of features (i.e., the dimensionality of data) increases, the complexity of the tree increases. This results in increased computational requirements.
The following code block shows how you can implement a decision tree classifier.
Random Forest Classifier
The random forest classifier is a type of an ensemble algorithm. An ensemble algorithm in machine learning is one that integrates several other different base models in order to create a single pipelined model to make the predictions. Random forest classifier, as the name suggests, are a combination of several decision tree classifiers, and in fact are a huge boost over a regular decision tree classifier. The algorithm uses a combination of decision trees, each trained on a set of features chosen randomly from the original feature set. At the time of inference, each of the trees are used to make the predictions, and the prediction with the highest voted frequency is returned as the final outcome.
The following are the advantages and disadvantages of the random forest classifier.
- Unlike a regular decision tree classifier, random forest classification algorithm does not suffer from the problem of overfitting.
- It works well even with very high dimensional data.
- Data does not require much preprocessing. This is because the model is very flexible.
- On an average, it has a better accuracy as compared to most other classification algorithms.
- The model architecture is very complex, and it can be very tough to implement one from scratch.
- Random forest classifiers are highly compute-intensive.
Here’s how to implement a decision tree classifier in Python using Scikit-Learn.
Support Vector Machines (SVM)
The SVM algorithm can be used as both a regression as well as classification-type problems. Unlike the other classification algorithms, SVM is primarily used for binary classification problems. If we consider each data instance as a point in an N-dimensional space, then the aim of the SVM algorithm is to find a hyperplane such that it can clearly differentiate the two classes.
NOTE- For multi-label classification involving N classes, we need to use an ensemble of multiple binary classifiers (total N(N-1) classifiers).
The following are the advantages and disadvantages of SVM classifier-
- The inference on SVM model is very fast.
- SVM works well with high dimensional data.
- As the scale of the data (i.e., the number of data instances in the dataset) increases, it becomes tougher to clearly differentiate the two classes via a hyperplane. As a result, the performance starts to deteriorate as the size of the dataset increases.
- SVM algorithm has a very bad response to outliers. Therefore, it results in really bad accuracy if the training data has a lot of outliers.
- Good for binary classification, but the complexity of the model increases exponentially as the number of target classes increases.
The following code block shows the implementation of the SVM classifier.
Naïve Bayes Classifier
The naïve bayes classification algorithm is based on the popular Bayes’ Theorem. The algorithm works with an assumption that there is a strong independence among the feature variables, i.e., it assumes that no two features in the training data are related to each other in any way and that the presence of one feature does not affect the presence of the other.
The following are the advantages and disadvantages of the Naïve Bayes Classification algorithm.
- Works well even if the scale of the data is very large. Hence, it has a high scalability as compared to many other classifiers.
- Is robust to outliers, hence gives good accuracy even if the dataset has a lot of outliers.
- Works well with both continuous and discrete data, and hence is very flexible.
- The model assumes that each feature in the dataset is independent. However, in real-world problems, the features in a dataset are often dependent on each other.
The following code block shows how to implement a naïve bayes classifier.
With this, we reach the end of this tutorial. To sum things up, we learned what classification is, followed by a quick rundown and implementation of some of the most popular classification algorithms used in traditional machine learning.
If this article helped you in any way, then share it with your friends and other machine learning enthusiast. Happy learning!
Follow AI & Data Sciences Religiously.