Deep learning was introduce in 1970 which is at that time used for linear solutions but not able to implement in non linear solutions which let this deep learning development in winter phase where research and development is stopped due to this bottleneck . In 2000 when activation function concept is introduced that time deep learning winter phase is ended . And due to activation functions deep learning is used in all non linear problems such as classification pronlem. Gofrey hinton father of deep learning has said in his interview that deep learning would never achieved present progress if activation function have not been introduced
Symmetry with biological Neurons
Human brain is made us of small units which is called as neuron . Our brain is made of 100 billion neurons and each neuron is capable of taking input signal to process it and convert it in to other non linear output . Activation functions in deep learning or artificial neural network (ANN) is inspired by this neuron. Our ANN node takes input and output values only when required . As in our brain we have 100 billions neurons which emits signal which adopt same concept used in ANN . Let say we have 10 thousand nodes and only those nodes emits output which are require now . Before the introduction of activation function only those nodes emits output which makes system linear but after introduction of activation function only those nodes emits output value which is significant
It is mathematical functions which takes input and truncate output which is negative output . It is really simple mathematically function. Functions which are used in deep learning (ANN) have to be differential. Numbers of activation which are used in deep learning are RELU, Learky-RELU , ELU , Sigmoid and more . If you want to look for activation functions structure and graph . you can visit here
We will take example of RELU which is rectified Linear Unit as name gave us feeling that it is some complex function which is not . It is just Linear function for positive values and leave all non negatives values .
As you can see in our above example that only positive number will be passed forward and for negative it will be stopped . Same as our brain neuron which passed only values which are required
Gradient Flow :-
Gradient Value is the value which is calculate through differentiation of Loss function in ANN or deep learning . Good Activation helps to flow gradient in ANN perfectly . Good activation function helps in eliminate gradient explosion and diminishing problem .SELU (Scaled Exponential Linear Unit) helps in gradient issues and it is used as alternative of BATCH NORMALIZATION
In this post we have discussed activation function in deep learning in details.