Comprehensive Understanding of Convolutional Neural Networks
Let us delve in and know about CNN in this article, but before that let’s know what is Deep learning.
We all use Google photos in our daily life to store the pictures. The best feature of google photos is that it recognizes the faces and put them together. Have you ever thought what is the mechanism behind this Face recognition is? Well, here is the answer ‘Convolutional Neural Networks’ (CNN). Nowadays CNN has become the most trustworthy method in the case of image visualizing and is most popular.
Let’s delve in and know something about CNN in this article, but before that let’s know what is Deep learning.
Deep Learning is a part of an extended family of Machine learning. It tries to mimic the process of the human brain by using Artificial Neural Networks (ANN). Types of architectures in Deep learning that has many applications in Computer vision, Face recognition, Speech recognition, etc.
- Deep neural networks
- Deep belief neural networks
- Recurrent neural networks
- Convolutional neural networks (CNN)
What is CNN?
A Convolutional neural network is also known as ConvNet, is a neural network algorithm in Deep learning. It consists of one or more convolutional layers and has many uses in Image processing, Image Segmentation, Classification, and in many auto co-related data. CNN is some form of artificial neural network which can detect patterns and make sense of them. This pattern detection is what made CNN so useful in image analysis.
CNN consists of input and output layers, along with multiple hidden layers. These hidden layers are a series of convolutional layers that convolve using multiplication (or any other dot product). It takes an image as an input and allocates importance to each feature/aspect of the image which we can be able to differentiate. CNN uses convolution instead of general matrix multiplication in one of the convolutions layers.
The inputs given to CNN are Input channels and outputs are Output channels.
In CNN there three different types of layers:
- Convolutional layers
- Pooling layers
- Fully connected layers
The basis of the CNN are convolutional layers, all right so what do these layers do?
Similar to any other layer the convolutional layer also receives input and transforms the input in a particular way and then transfers the output to the next layer. This transformation is a Convolution operation. Mathematically these operations are known as cross-correlations.
These are the layers that detect the patterns within the data. To be more precise, we should specify the number of filters each layer should have. These filters are going to detect the patterns.
What exactly are these patterns?
There is a lot that will be going on in an image like edges, textures, objects, colors, shapes, etc. So, one type of pattern that the filter can detect is an edge of an image then, this filter can be termed as an Edge detector. Few filters detect corners, some detect shapes (circle, squares…).
Now, these simple geometric filters we use at the start of our network. The deeper our network grows the more sophisticated these filters become. In even deeper layers these filters will be able to detect particular objects like eyes, ears, fruits, birds, etc.
In general, we use a simple matrix as a filter in CNN. We have to decide the number of rows and number of columns the matrix will consist of. Further, we will assign values of the matrix with random numbers. These filters are known as Convolutions kernels.
For example, let’s consider a filter of size 3×3, input size 6×6, and stride as 1 and now, we have to perform dot product of input block of the same size with the kernel. And sum up the result to one and place it in the output. The result after applying the filters is known as a Feature map. This process of transformation is convolution.
Examples: All the filtering techniques in image processing, in starting layers edge detection is performed as a part of pre-processing.
In the above code, we are applying a median filter with the kernel as a 5×5 matrix.
The pooling layer is another main component of CNN. The main aim of the pooling layer is to reduce the spatial size, dimensionality reduction, and these speeds up the classification. The parameters required here are Filter size, Stride, Max pooling. The output of the convolutional layer will be the input to the pooling layer.
Filter Size: We define the size of the corresponding filter
Stride: By how many pixels we want our filter to move on the image.
Max pooling: Max pooling is a transformation of an image by reducing its size.
Let’s assume the filter size is a x a, and the dimensions of the image are nxn. Now, we will find the maximum value among the pixels of each block of size a x a in the image by moving a few pixels each time.
Here in the below example, we consider stride = 2 and filter size as 2 x 2 and 4 x 4 matrix as input. What max-pooling does is that it identifies the maximum value from each 2 x 2 block by moving two pixels each time, and puts all those max values together.
Fully Connected Layers
Fully connected layers are the last few layers of CNN and connect every neuron in a particular layer to every other neuron in the next layer. The output of the pooling layer or convolution layer is flattened and then is given as input to a fully connected layer. Then the FC layer compiles the data and finally classifies them into different classes.
The figure below clearly depicts how CNN takes an image as input and classifies it and recognizes it as a zebra.
Real-Life applications of CNN
- Face Recognition, Speech and Handwriting Recognition
- Analyzing Documents
- Understanding Climate changes
Convolution neural networks are the most important part of data science and play a prominent role in our daily life.
Thanks for reading! Always keep an eye on datamahadev.com because we deliver everything with clarity from basics.