5 Ultimate Architectures of CNN

Do you agree, that good architecture in our daily life creates a stronger community? Well, you have to because architectures are here to improve our lives and to make them simple. Similarly, every network needs its design to make it easily understandable and applicable to large data.

The pre-requisite of this article is the basics of Convolution Neural Networks.

Importance of Architecture in CNN

The architecture of CNN is the most important factor that analyses the performance and determines accuracy. An arrangement of layers in the network and the filters used in each layer impacts the performance of an algorithm a lot. Through convolutional neural networks, machines can visualize the real world like humans and CNN will always be the go-to model to recognize the objects.

Let’s see the types of Architectures in CNN in chronological order:

  • LeNet
  • AlexNet
  • VGGNet
  • GoogLeNet
  • ResNet


LeNet is the first architecture of CNN which is very small and simple to understand. It is a 7-layer convolutional network by LeCun. It was designed to recognize digits. And its major applications are digit recognition, handwriting recognition. Many banks use LeNet to recognize the signature on the checks.   

The seven layers in LeNet are two convolutions layers, two pooling layers, one fully-connected layer, and at last one SoftMax layer.   

Input to the first layer is a grayscale image (32×32) and it passes the first convolutional layer with a 5 x 5 filter. The output of the first layer passes through the average pooling layer with filter size 2 x 2. Again, this image comes across a convolutional layer followed by a pooling layer with the same filter sizes. Then the fifth and sixth layers are Fully-connected layers with filter size 1 x 1. The final layer is the softmax layer which identifies the digit. LeNet uses a tanh function.

LeNet cannot process higher resolution images because of a smaller number of layers. To overcome we use other architectures.

The below table shows the summary of LeNet.

S.NoLayerFeature MapOutputSizeKernel sizeStride
1.Convolution layer628 x 285 x 51
2.Avg pooling layer614 x 142 x 22
3.Convolution1610 x 105 x 51
4.pooling165 x 52 x 22
5.Convolution1201 x 15 x 51
6.Fully connected84
7.Fully connected10


LeNet performs well with the limited amount of data, but in real-world data will never be simple and consists of numerous variables. AlexNet is the first large scale CNN architecture that can do well in ImageNet classification. The architecture of AlexNet is quite similar to LeNet but is a bit deeper, larger and it uses continuous convolutional layers. It has sixty million parameters and 650k neurons. This is very well-built that it is capable of achieving greater accuracies. It has large applications in computer vision in artificial intelligence.

This architecture has eight layers out of which five are convolutional layers, and the rest are fully-connected layers.   

AlexNet takes RGB image of size 256 x 256 as input. All images in the training dataset should be of the same size. Each convolutional layer consists of multiple kernels of the same size and extracts some prominent features.   

The most important features of AlexNet are 

  • ReLU Nonlinearity:  Instead of tanh function AlexNet uses ReLU (Rectified linear units) as a function to train the dataset. The main advantage of using ReLU is it reduces the training time and makes the algorithm run almost six times faster. And using ReLU we can get a 25 percent training error
  • Overlapping Pooling: By using the overlapping of pooling, we can reduce the error by about 0.5 percent. And using this occurring chance of overfitting are very rare.


VGGNet proves that the depth of the neural network is very crucial in its performance. It consists of sixteen Convolutional layers. In VGGNet all the convolutional layers use the same filter size of 3 x 3 also with stride as one, and all the pooling layers also use the same filter size of 2 x 2. The main drawback of VGGNet is that it is quite expensive to evaluate and it has a lot of parameters like nearly 140 million. In addition to these, it consumes a lot of memory. Taking into account all these reasons VGGNet trains very slow.



GoogleNet or Inception is way different from other architectures. The main aim of Google Net is to reduce the number of parameters by developing an inception module. The number of parameters in this architecture is just four million. This uses various types of functions like 1 x 1 convolution, and average pooling in place of Fully connected layers which makes this much deeper.  

We use 1 x 1 convolution in GoogleNet to reduce the number of parameters. Parameter reduction implies the depth increment.


Along with increasing the number of layers we get the problem of exploding/vanishing gradient. This leads to zero gradients or too large gradient. Because of this, there will be an increment in the error rate. To overcome this problem, in 2015 the researchers have introduced new architecture Residual Network (ResNet). It consists of hundreds of convolutional layers. In ResNet, we use the Skip connection technique.

Skip connection: It skips the training that is happening in a few layers and directly leads to the output. The main advantage of using skip connection is that it skips the layers which are the reason behind explosion or vanishing, and uses the results from previous layers.


These few architectures are used frequently in computer vision.

Thanks for reading!


You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

DMCA.com Protection Status