Top 6 AutoML Frameworks
Have you ever been so frustrated by extra tedious tasks while working with machine learning models? For sure, most of you would have. In this era of automation, why couldn’t you automate these machine learning pipelines to save time and effort? This is where AutoML frameworks come into the picture. This article will introduce you to the 6 most popular AutoML frameworks.
H2O AutoML is an open-source & distributed in-memory ML platform with linear scalability, developed by H2O.ai. This tool supports the popularly used machine learning algorithms including gradient boosted machines, generalized linear models, deep learning, and many more.
H2O AutoML supports both R and Python. The interface of H2O AutoML is very simple with minimum parameters so that the user just needs to point their dataset, recognize the target column and specify the total number of models trained or a time constraint if required.
H2O uses its algorithms to create pipelines. It provides a simple wrapper function that performs a large number of modeling-related tasks which would require many lines of code and hence saves a lot of time. It uses an exhaustive search for feature engineering methods and model hyper-parameters to optimize pipelines. It also provides machine learning interpretation (MLI) and automatic visualization. It can also be a helpful tool for advanced users as well as for novice.
Auto-Sklearn, an open-source project, came to public use in 2015 by Matthias Feurer through his paper “Efficient and Robust Automated Machine Learning”. As the name suggests, it is built around the sci-kit-learn library.
It is an automated machine learning toolkit based on 3 phases: Meta-learning(reducing space search by learning previously good performed models on similar datasets), Bayesian optimization(creating bayesian models for finding the optimal pipeline), and Ensemble construction(creating ensemble model reusing the most accurate models). It includes 15 classification algorithms, 4 preprocessing methods, and 4 data preprocessing methods which define the right algorithm to optimize parameter accuracy at a precision level of more than 0.98.
Thus, it frees the machine learning user from a lot of tedious tasks and allows him to focus on the real problem. It is observed that it translates well for small and medium datasets, but it’s complex while dealing with large datasets.
TransmogrifAI, an Open Sourced AutoML by Salesforce was released in 2018. Written in Scala, It’s an end-to-end AutoML library for structured data that runs on top of Apache Spark. TransmogrifAI can automate feature analysis, feature selection, feature validation, model selection, and many more. The sole purpose of designing this platform was to improve machine learning developer productivity not just by automation but also through an API that enforces compile-time type-safety, reuse, and modularity. It is observed that it can achieve accuracies close to hand-tuned machine learning models with almost 100 times reduction in time.
TransmogrifAI can be used when we need to build reusable, and strongly-typed ML workflows. It requires less time and minimal manual adjustment to training the model. It can be simply added as a regular dependency to an existing project.
TransmogrifAI is so flexible as it allows users to specify all the features(to be extracted) and algorithms to be applied in the ML pipeline.
TPOT(Tree-based Pipeline Optimization Tool) was developed by Dr. Randal Olson. It is a Python AutoML tool that optimizes ML pipelines using genetic programming. It was one of the very first AutoML methods which was also an open-source package, developed for the machine learning and data science community.
This package aims to automate the building of ML pipelines by combining a flexible expression tree representation of pipelines with stochastic search algorithms and maximizing classification accuracy on a supervised classification task. It makes use of the Python-based sci-kit-learn library for data transformation, feature decomposition, feature selection, and model selection
There are 4 main operators – Preprocessor, Decomposition, Feature Selection, and Model and these are made up of a set of sci-kit-learn. The Flow of the Dataset is through the tree where the features evolve operator by operator, and the model is generated in the final operator. After that, the best-performing tree structure is identified by an optimization procedure for a given dataset.
Auto-Keras, developed by DATA Lab, is an open-source software library built for automated machine learning. As the name suggests, It is built on top of Keras, which is a deep learning framework. Hence we can say that AutoKeras is an implementation of AutoML for deep learning models using the Keras API.
This AutoML tool allows users to automatically search for architecture & hyper-parameters of deep learning models. Auto-Keras is easy to use as it follows the classic Scikit-Learn API design. In Auto-Keras, Machine learning can be simplified with the use of automatic Neural Architecture Search (NAS) algorithms. Neural Architecture Search uses a set of algorithms that automatically adjust models to replace deep learning engineers/practitioners.
It not only provides users an ability to build powerful and optimal architecture but also allows them to obtain the TensorFlow model required for their new application. AutoKeras can be quickly installed, easy to use, and easy to modify, which makes it a great open source project.
MLBox is another powerful AutoML python library. It provides highly robust feature selection and accurate hyper-parameter optimization. MLBox supports distributed data processing, formatting, cleaning, and many algorithms for regression and classification.
Preprocessing, Optimization, and Prediction are the three sub-packages of MLBox. Each one of which is aimed at their specific tasks. As compared to other machine learning libraries, MLBox focuses more on Drift Identification, Entity Embedding, and Hyperparameter optimization, in which Identifying and removing drift variables makes MLBox very unique. It provides a class called Drift_thresholder that calculates the drift score of each variable when training and testing sets are given as input. MLBox has also shown good performance on Kaggle.
MLBox generates a new model by combining a stack of models, aiming to perform better than the individual models.
It is clear from the above discussion how useful AutoML frameworks are. One could choose any of these frameworks depending upon the business needs. So, this was all for this article. Kindly Upvote and share, if you enjoyed the article.
Thanks for reading!