Machine Learning: Classification Algorithms Step-by-Step Comparison



A Project-Based Machine Learning Guide Where We Will Be Faring Different Classification Algorithms Against Each Other, Comparing Their Accuracy & Time Taken for Training and Inference.

In the last part of the classification algorithms series, we read about what Classification is as per the Machine Learning terminology. In the same article, we also had a brief overview of some of the most commonly used classification algorithms used in traditional Machine Learning.

This part is a continuation of the last article. In this one, we will be working on a binary classification problem, trying to devise a model-based solution for the problem using each of the six algorithms that we discussed in the last part. So, let’s get started.

First, we will have a look at the problem we are aiming to solve via this project.

Problem Statement

The motivation behind this project is to create a machine learning model that is capable of predicting whether a given breast tumor is malignant (cancerous) or benign (non-cancerous). This is a binary classification problem, where the possible target outcomes are 0 (malignant) and 1 (benign).

The dataset we will be using is the scikit-learn library’s built-in breast cancer dataset. The dataset has 0 null-values.

Importing Project Dependencies

Let us begin by importing all the dependencies required for the project.

Now, let us import the dataset. Like we discussed earlier, the dataset has 0 null values. Though we still might have to perform some sort of scaling (standardization/normalization) on the data if required.

Now, let us visualize the number of malignant and benign samples we have in our dataset.

As you can see from the above plot, there is a slight sampling bias in the data. However, the difference between the frequency of malignant and benign samples in the dataset is not very large. As a result, this small sampling bias won’t have a major effect on the inference performance of the model. 

Check out our sampling bias article(covered in detail) to know how it might affect the performance of your model.

If you notice carefully, our feature dataset has 30 features. Now one criterion of comparing the different classification algorithms is their response to high dimensional data. While 30 features don’t exactly count as high dimensionality, but it might still give us a rough idea regarding how the performance of the models based on different algorithms will be affected.

Before we go any further, we have to standardize our data. This is because there’s a considerable difference between the scale of values in different columns of our feature set. For example, the values in the ‘mean area’ columns are of the order 104 while on the other hand, the values in the column ‘smoothness error’ are of the order 10-2. Let us see how we will apply standardization to our dataset.

Let us now split our dataset into training and validation sets into an 85:15 ratio.

As we can see, we have around 480 instances in our training set and around 80 instances in our validation/testing set. 

Now, let us move on to the next stage where we will try to model a solution for our cancer classification problem.


In this section, we will be training and evaluating models based on each of the algorithms that we considered in the last part of the Classification series— Logistic regression, KNN, Decision Tree Classifiers, Random Forest Classifiers, SVM, and Naïve Bayes algorithm.

The following will be the criterion for comparison of the algorithms-

  • Training time
  • Inference time
  • Inference accuracy (F1 score)

So, let’s get started. 

  1. Logistic Regression-
  2. Training the model:
  1. Evaluating the model:
  1. K-Nearest Neighbors-
    1. Training the model
  1. Evaluating the model
  1. Decision Tree Classifier-
    1. Training the model
  1. Evaluating the model
  1. Random Forest Classifier-
    1. Training the model
  1. Evaluating the model
  1. Support Vector Machines (SVM)-
    1. Training the model
  1. Evaluating the model
  1. Naïve Bayes Classifier- 
    1. Training the model
  1. Evaluating the model


Now that we have completed training and evaluating each of the models, we can conclude from the observations given above.

  • Training time– 
RankAlgorithmTraining Time (ms)
2Naïve Bayes8.96
4Decision Tree14.6
5Logistic Regression20.1
6Random Forest 214

As we saw in the first part of the Classification series, the KNN algorithm is non-parametric. This means it doesn’t require any training. The training data simply gets stored within the memory. That’s why the KNN model took the least time to train.

Now moving onto the algorithms with the slowest training time — the Random Forest algorithm. As we know this is an ensemble method, which means it uses a combination of several other base models. Since we trained our random forest classifier on default parameters, the model trained 100 individual decision trees in the background. That is why the random forest classifier took the highest time for training.

  • Inference time
RankAlgorithmInference Time (ms)
1Naïve Bayes3.32
2Logistic Regression3.6
4Decision Tree8.18
6Random Forest 20.9

Once again, the random forest is the slowest of all algorithms when it comes to inference. This is because to make a prediction, the forest takes predictions from each tree and then the class with the highest frequency is chosen as the final result. 

KNN is also very slow when it comes to making predictions on the validation data. This is because, to make a prediction, the algorithm has to calculate the validation data point’s distance to every single training data point. Then only can the algorithm decide which are the K-nearest neighbors for the new data point?

  • Accuracy (based on F1-score)
3Logistic Regression0.9794
4Random Forest0.9680
5Naïve Bayes0.9600
6Decision Tree0.9200

Here, we notice that the decision tree model gave the least accuracy as compares to the others. This is because tree classifiers are very prone to overfitting on the training data, which might result in low inference accuracy. From the above-given accuracy comparison where the rest of the classifiers have over 96% accuracy, it seems like this is exactly what happened in this case, hence the lowest accuracy.

With this, we come to an end of this tutorial cum analysis where we not only learned how to create a breast cancer classifier, but we also compared the performance of different classification algorithms. 

In the end, we conclude that no single algorithm is perfect for all your needs. The choice of the algorithm depends primarily upon the dimensionality and scale of the data. Therefore, to get the best possible solution for your problem, you have to experiment with different algorithms and multiple combinations of hyperparameters until you find a model-based solution that best suits your needs.

If you liked this article, make sure that you share it with your friends. Happy learning!


You may also like...

3 Responses

  1. November 10, 2020

    […] and if you want to learn more about classification algorithms, make sure to go through this link:- Link.  So the code is as […]

  2. November 27, 2020

    […] systems are created. So to answer this, This kind of recommendation system is created using a machine learning algorithm called K-Nearest […]

  3. December 10, 2020

    […] of us have heard the word Confusion matrix in Machine learning for classification problems. The confusion matrix is a table (a combination of rows and columns) that is used to evaluate […]


Leave a Reply

Your email address will not be published. Required fields are marked * Protection Status