Understanding Mathematics for Machine Learning
Have you ever wondered how these software’s work and what is the mathematics behind these libraries? Check this article out to understand conceptually.
Nowadays training the system using an algorithm has become simple and straightforward because of all the software’s available. In addition to the software, the inbuilt libraries to train an algorithm are very flexible and understandable.
What is ML?
Machine Learning mainly concentrates on Statistical learning theory. When loosely translated means, having insight about the amount of data required to predict the unseen data. To understand it better we will first see the difference between Statistics and Machine Learning because both seem to be way similar.
Statistics is a fully developed procedure of defining the relationship between the variables but these models lack accuracy in predicting the results. Besides Machine learning models are developed to produce the most accurate predictions. Likewise, Machine learning provides different types of interpretability.
Machine learning (ML) is all about designing algorithms that will extract the required information automatically. The main aim of Machine learning is to train the system with a certain amount of data and find the results of the remaining (testing) data. Machine Learning not only trains the data but also generalizes the procedures to acknowledge the patterns within data.
Mathematics in ML
Mathematics in Machine learning is not about just processing the numbers, but about what is happening, why is it happening, and how can we obtain good accurate results.
The main four pillars of Machine learning are Regression, Density estimation, Dimensionality reduction, Classification. These topics require a solid mathematics foundation.
The image below depicts that these pillars are mainly based on mathematics.
We need mathematics to represent data (information) as vectors, choose an appropriate algorithm to predict, learn from the data available, and predict the results for unseen data. In the last case, we use Numerical Optimization.
Concepts we need:
- Linear Algebra
- Multivariable Calculus
- Statistics and Probability
- Optimization Methods
Let’s see each concept clearly
Linear Algebra is the mainstay of Data Science and Machine learning. Algebra is an approach to create a set of symbols and a set of rules to operate on them. Likewise, Linear Algebra is the study of vectors, matrices, and definite rules to manipulate these vectors, matrices. We can add vectors and multiply a vector with a scalar to generate a new vector. Here are some types of vectors which we use generally are Geometric vectors, Polynomial vectors, Signals to vectors, Elements of real number sets are vectors, etc.
Linear algebra concentrates on the similarities between these objects. Linear algebra solves many problems by using Linear equations (means representing data in simultaneous equations form).
We humans can solve two or three simultaneous equations to find the solution, but we can’t solve the equations containing hundreds and thousands of variables and find the pattern between these variables. Here comes the Linear Algebra into play and solves using Machine Learning automatically. Each equation represents a single piece of information from the dataset.
The image below shows the Linear algebra way of solving equations. Using matrix operations, we can solve the below equation and find the variable. This is the main reason behind using Linear algebra in Data science.
- Dataset Representation: Linear algebra plays an important role in data representation in Machine learning. In general, we consider a set of data and represent it in the form of a matrix. Furthermore, while splitting the data into training and testing sets, we have a matrix X and a vector y.
- Images and Photographs: We work on images more frequently in Machine learning in computer vision applications. Each image used is a tabular representation with width, height, and a pixel value. Image is another example of matrix representation from Linear algebra.
Image operations such as Image scaling, cropping, shearing, etc all use the operations of Linear algebra.
The figure below shows the output of this snippet. That is the image representation.
- Linear Regression: Linear regression is an important method from statistics to define the relationship between features (variable). We use it regularly to predict the numerical values in Machine learning. There are different ways to solve a linear regression problem using machine learning. The most common way to solve this is by using Least Square Optimizations that use Matrix factorization methods such as LU decomposition, Singular value decomposition. In either way, it uses Linear algebra.
Calculus is a branch of mathematics that studies the rate of change of quantities of items. Calculus means understanding a problem by looking at smaller pieces. Many Machine learning algorithms optimize a function concerning certain parameters that predict the model well. That is finding exactly fit parameters is termed as optimization.
Calculus is divided into two different types:
- Differential Calculus: Divides an object into small pieces to know how it varies.
- Integral Calculus: Integrates the small parts to find how much is there.
Gradient Descent: This measures how the results vary when we change the input.
In Machine learning, our main aim is to minimize the cost. In addition to cost, we need to minimize the error. In short, getting the lowest possible error. We increase the accuracy by training the model iteratively.
Cost Function: Cost function is an equation that measures the performance of an algorithm or model in Machine learning.
The following are the cost functions of Linear Regression and Neural Networks.
We can calculate partial derivatives of the above cost functions concerning each feature, then we will able to know which feature is more affecting the model. You can represent these partial derivatives in a vector form using linear algebra. This vector representation is termed as Jacobian Vector.
Applications of Calculus in Machine Learning:
- Curve-Fitting: Curve fitting is a kind of optimization that discovers the minimal set of parameters for a cost function that best fits the data considered.
The figure below depicts the curve fitting method. The dotted line tries to join all the data points.
- Dimension Reduction: Dimensionality reduction means reducing the number of input features because a greater number of features leads to poor performance.
Probability loosely translated is the study of uncertainty. We use probability to measure the chance of occurring an event while statistics involves the analysis of occurred events. Often, we measure the uncertainty in a machine learning algorithm and uncertainty in the predictions. To quantify uncertainty, we need a random variable.
The most popular theorem in probability is the Bayes Theorem.
Bayes theorem defines the relationship between the conditional probabilities of different events (where human intuition often not works)
In data science these probabilities are termed as:
P(A) is called Prior or consider it as our Assumption.
P(B) is evidence.
P(B/A) Likelihood and P(A/B) is the Posterior probability.
P(A) – Patient has Lung cancer.
P(B) – This is the evidence (City scan report) that patient is a chain smoker
P(B/A) – Is the probability that people who are suffering from lung cancer are majorly smokers
Now, Bayes theorem tells us that, if a person is a smoker then the chances of having lung cancer – P(A/B).
We use the Bayes theorem in Naïve Bayes Classifier and Bayes Optimal Classifier.
- Naive Bayes: Naïve Bayes algorithm works similarly to the Bayes theorem. In Naïve Bayes all the input features we consider are independent and each feature makes an equal contribution. That is every feature we classify is independent of others.
Similar to Bayes theorem, Naïve Bayes assumes the presence of a certain feature in a particular class, and then later it finds the probability of that feature being in that class.
Mathematics the heart of Machine Learning which is not visible, but plays a crucial role. Hope you got clarity about the maths behind those libraries.
Thanks for reading!