As we have discussed in previous post that in deep learning we require three things to develop deep learning model.We need to figure out three components in deep learning which are data,architecture and loss function.In previous post we had already discussed data and how to process data before introducing it to architecture.If you haven't read that post kindly click here.In this post we will discuss basics of architecture and what all terms are used in designing architecture .
.What is Architecture in deep learning(AI)?
Architecture is the part of deep learning which has responsibility to learn from data.It contains weights, which is learned from your past data.It contains all learnable components.Since deep learning promise us to teach anything with basic simple architecture but than also it requires lots of data and will take more time and GPU usage to attain high accuracy.But in real world we often don't able to accumulate that much data which will train our model effeciently or in case if we able to accumulate that much data we have to spend more time to train our basic architecture to learn properly on GPU's.As GPU's are expensive we need to spend lots of money on GPU's which increases our cost directly.So to decrease our cost and make our own model to learn from less data we need to create good architectures.Good architecture learn from less data in less time with high accuracy results.
Architecture is basically a stacks of layers in which data is passed one by one.Which change data at each layer and learn some weights(numbers) because of that data.Sequence and type of layers affects architecture efficiency and learning power.Terms and nomenclature used in deep learning for architecture are layers,weights,activation function and activations.In this weights and activations both are numbers ,layers and activation function represents some mathematical functions in which data is passed.We will try to explains those above terms using an simple MNIST explain with just linear layer which has two layers both are linear.Don't worry if you don't understand linear layer meaning we will explain it later in this post.For just assume that it contains weights and bias which both are learnable.
AS you see in above figure we have design two linear layer architecture for MNIST data set and one activation function which is (RELU) in between them.
LAYERS:-layers means a function which contains learn-able weights.Here in the above architecture we have used linear functions which is y=ax+b where 'a' is weight and 'b' is bias which both are learnable.
WEIGHTS:-Weights is just a number which learnt through data.Since in above example we have two linear layers which contains weights which is numbers,in starting it will be random numbers but as our loss function will improve that number as we train.
ACTIVATIONS:-activations is also a number which is created after passing input to linear layer.Output of linear layer and activation function is a number which is called Activations.
ACTIVATION FUNCTION:-it is mathematical function use to transform output from linear layers which are activations and transform that a little bit because of that function Different type of activation function are RELU,LEAKY-RELU,tanh,softmax,sigmoid.
Different type of layers which are used in designing architecture are Linear layers,Convolution Neural Networks,Recurrent Neural Networks,LSTM(Long Short Term Memory),GRU(Gated Recurrent Unit),Batch Norm Layer,Dropout Layer and many more.To make you fully understand it will require blog for each layer.We will discuss some layers in some detail to give you purpose of each layer.
CONVOLUTION NEURAL NETWORKS(CNN):-This layers consist of many terms strides, pooling,filters which itself require new blog to explain.Its main purpose is to find spatial symmetry in that data.It is mainly used in vision applications.
RECURRENT NEURAL NETWORKS(RNN):-This layers is mainly used in memorising sequences to a certain time stamp(period).
Its main applications are in sequence memorising problems as in audio,NLP, time, series and many more.
LSTM(Long Short Term Memory):-This layer is mainly used to remember content in language problem.Its main application are in speech recognition solution ,translation solution.
Linear Layers:-This layer is simple linear function y=ax+b where 'a' and 'b' represents weights and biases.It's main aim is to generate continuous numbers which is used in tabular applications.Lets say you want predict sales by data their we use linear layers
Popular architecture which got its fame through it efficiency and accuracy are RESNET , INCEPTION , VGG
CONCLUSION: In this post we have explored and see details of architecture component of deep learning programming we have seen activation layers and activation function we have also discussed the purpose of famous layers which are used in programing deep learning models in the coming post we will be discussing in deep different layers and architecture in details