Rectified Linear Unit – The topic for today’s article is rectified linear unit. That is ReLU activation function. So in the last articless, we saw what was sigmoid function then we saw what is tanh function? And before that, we saw what was the step function? And we just excluded step function from being the activation function using neural networks because it has certain limitations.
That is it was not fitting into the criteria of different properties of being activation function that is it was what not differentiable as well as it was not showing any non-linearity property, but when we talked about the sigmoid function as well as it and its function those were site sign two of the properties that were they were nonlinear as well as they were differentiable.
So we also know that the graphs of these functions so say We have this sigmoid function the graph of it looked something like this. So it was ranging from 1 to 0 and you have a breaking point somewhere at 0.5 and for tanh. The graph was something like this so 0 and you have minus 1 to positive 1 And then what we did is we took the derivative of this. So the derivative of this looks something like this you had a function and we obtained similar kind of graphs for the differential for this particular activation functions now we then address the problem that these two activation functions were suffering that is the problem of Vanishing gradient.
And we also know that in the literature there are no known solutions to mitigate this Vanishing reading problem. So in most of the recent times, we mainly use this relu function, which will not fall into the problem of Vanishing gradient, so it is very simple. So let’s say we are putting the activation function on some outputs easy and we have this R which stands for relu. So what it says is that this function value is 1 if the value for Z is greater than or equal to 0 and it is 0 for Z less than 0 or this.
In fact, it’s say It is it so the range of relu function is a maximum value between z & 0. So if you want to visualize this particular graph looks something like this. So this part that is on the RHS side we can see that this has been realized as a linear regression problem. So this is a linear part. So let’s say the unit comes into the picture and then this portion that is on the left-hand side. You can see that this is constant. So it contains the 0 so where your read for this particular is from 0 to Infinity.
So 0 is bounded and infinite is unbounded. So you have this relu function something like this. So this is f of Z and F of Z is equal to 0 and this is equal to z and you have your zo so this is centred at 0 and you have a function something like this.
Now you can see that there is no particular problem of that is Vanishing gradient that is occurring in this particular function. Also, it satisfies both properties that is non-linearity as well as it is differentiable. But this particular relu has got one major disadvantage. So that is the problem of dying neuron Yes. You heard it, right that is dying neurons means you can consider brain dead. So how does it looks like so say this is our unit and here we have a linear part your computation and say We are Computing the R(z).
This is the real work this now what happens is that when you take the derivative of this whatever portion on is there on this linear side right becomes differentiable, but when you just take the differentiative of this part that is on the negative axis, so the – axis on your number line, this is your positive axis. So for positive axis the differentiative when you do the gradient descent, it does not cause any problem so So if you want to see the graph of this if you want to differentiate this so say we are taking F Prime of Z.
So this graph becomes something like this so that steady so that also the case for a step function, but in case of this negative values this attains 0, so this is where for- differentiation for negative d by dX This value is equal to 0 so suddenly, you’re F dash X for greater than or equal to 0 it was one but as soon as it is less than 0 that is negative.
It is falling down. So essentially if you just consider visualize this if you have a large network of such neurons and say this is your final output and you are connected by different pulling layers. Now you are giving some input to this and say each of the portions has some relu unit.
So even want to consider your signage or either the sigmoid function by because they are having the problem Vanishing gradient with that notion when you apply this rectified linear units and this has particular dying your own problem because if you take the derivative of this then that run will particularly collapse now if you have consider such a large architecture of such network if you build with great difficulty and if one particular note just collapsed.
What happens is that all the weights associated with it will also collapse and thereby that gets propagated along with all the network layers and this will fall prey to the problem that is called dying neurons. So what is the solution to this? So the solution to this is what we use a variant of this called leaky relu now as from the name suggests you are making some leak in this particular function. So where that leak comes into the picture. So this part remains the Same. So this is same as it is, but on this portion that is on the negative side.
You have a very small I got this so they’re here. The slope is 0 here the slope is not equal to 0 so here if you have F of Z, it will be multiplied with some a of Z and here F of Z is equal to Z that is for greater than equal to 0 for R less than 0 you have a into Z. So this is again a hyperparameter that you decide. So usually most of the cases it is kept as 0.001. So when you are Update the weight. That is W new is equal to W old minus eta into dou J by dou W. You apply this so here.
The neuron was just approaching to zero and it will die eventually but here what happens is that since you have put a leak over this so this is where the leak happens. So when you just apply this value, so it’s not equal to 0 it is approximately equal to 0. So it will make some movement in the slope. So, you know, that was our function loss function. So it will just move a bit down but it will not converge or it will not just fall here. So by this approach of leaky relu you can solve this problem.
So what is the main advantage of having rectified linear unit is nothing but the computational saving So if you have your sigmoid or tanh then at some point of time instinct, they will fall prey to Vanishing gradient. But this will essentially save you from having those but if you are using this leaky version of this relu, then you can just come out of this. So this particular real function its derivative that is the first-order derivative.
Both are monotonic functions monotonic means This either increasing or it is either non decreasing in one particular direction so you can see that it is increasing in one particular direction. So both of its derivative and its original function both the monotonic in nature now just to give you a real-life example of this particular system a look same you have this water tap. So you have this system. So you have water that is Flowing out of this now in case of requirement what you do is you just turn this step and you get the water so that is your output.
But what in case of your normal relu happens is that you are just storing this water on this hump on the backside. So say this is your pipe and its water is just getting clogged over here. So essentially What is happening is that you are not giving out the water but eventually what happens is that this is about to die. So you can assume like this is just increasing expanding in size and growing. So what leaky relu give you does is it will dig a hole into this and it will start leaking some water so that it saves from this particular portion.
So you get the output from this and as well, but here this dying your own problem is also eliminated in this so that is the major goal of Why we use the Leaky relu, so well, that was all regarding the relu and leaky relu function in deep learning as activation functions.
reference – Rectified Linear Unit
Web enthusiast. Thinker. Evil coffeeaholic. Food specialist. Reader. Twitter fanatic. Music maven. AI and Machine Learning!