# Activation Functions: Step Function Activation Functions: Step function – So in the last article, we saw different types of algorithms like backpropagation algorithm the Delta rule chain rule Etc and many other things. So in today’s article Let’s see, what are the activation functions that is the main nonlinear part that comes across the neural unit and why we require in mainly for deep learning, so essentially we know that our Neural network looks like something and say let us consider some structure like this.

And you know that each part is comprised of two sections that is non-linear and linear part. So we mainly employ some sigmoid function over here. So essentially why we require this activation function or we can have only this linear part so that’s essential right now. Well, it’s not the same way. So since our human brain, we have two write brains and left brain. So similar is the case with neural unit as well. So it’s like a human being without a brain so that neural network has also a nonlinear part.

So that is an indispensable portion. So whenever you have a unit you have both linear part and non-linear pattern now this nonlinear part satisfies two different properties in nature. So the very first is it should be nonlinear. And second is that it should be differentiable.

Now since we are talking about the nonlinear part, then why we call it as nonlinear or white has to satisfy non-linearity. So before this sigmoid functions, we have this MCP that is McCulloch and Pitt’s neuron. So in that what they used is they used a step function which looks something like this. So This step function has one property or there is one limitation is that it is someone thought it is not differentiable. So whenever you have a neural unit or whenever you have a neuron, you should have a nonlinear part and we should satisfy both of these properties.

It is first is non-linearity. And from there it goes to differentiable. Now, why you require differentiable or why we are saying that an activation function which should be deployed here should be differentiable because say you You have a large pool of networks, like convolutional neural networks. And there you have n number of different layers. So if there needs to be a case, like if you want to just go back to each weight and then you want to update.

So how would you do that so that they have essentially seen in the backpropagation algorithm? So that’s why we say that if you use a nonlinear function, that should be essentially differentiable else.

That’s a waste of having some nonlinear part, which is not differentiable. Well in the literature for neural networks, we have two categories of activation functions. So one section says it is a piecewise linear function. And another section is called as smooth functions. So in piecewise linear functions, we primarily have step function. Then you have something called ReLU that is rectified linear unit. Then you have a variant of ReLU. That is leaky ReLU.

You have parametric ReLU And then in smooth functions, you have something called as sigmoid. That is we know that is from our logistic regression. Then we have something called as a hyperbolic tangent. That is tanh function. And then what we have is we have something called as swish. So swish is a combination of your sigmoid and tan H so which is just known non-monotonic function. And it will outperform all other activation functions which are present in the literature for neural networks. So now what we do is we do a comparative overview of the very first function.

That is we have the step function or why it is always suggested that you do not go with a step function. So this part is essentially the nonlinear part if you have this, Step function. It looks something like this. So say this is my coordinate. And this is my function. So step function has some value like it has 0 and it is 1 so it’s like moreover like you are enabling some gate or enabling some unit. So whether that has to operate or not. So one means you will fire the output And zero means you don’t fire. So firing is done in a case like if that particular linear computation exceeds some particular threshold.

So say it is W 1 x 1 plus W 2 x 2 and if it is greater than some threshold say in that case or in that event you trigger a one else in the other case if it’s less than that threshold, you don’t fire any function or any output. So merely you can say like there is not much particular activity that is being carried out by your step function. So it is like just you’re bypassing or you’re opening some gate to do some particular function just your opening and closing a guard which is kept at this particular portion. So that is not nearly changing anything. So it can also be seen that this is not linear. So this particular step function is linear in nature.

So our very first criteria of the activation function is not being satisfied. So forget about being differentiability is not even linear. So there is not any not even nonlinear. So it is a linear function. So you cannot deploy that into your algorithms or for designing your neural network and also say this is our function if we want to take the differentiative. If so say we are computing F prime of Z. See this is there of some computations and if you just visualize this the derivative of the very first primary derivative would be lying somewhere here.

So say this is minus Z and this is plus Z. So the derivative is 0 at this particular point. So essentially it is not linear. And so you have some breaks at this particular two points. So it is not smooth. So there is a sudden shift from positive 1 to 0 that is happening. So essentially you can’t deploy this and it will not learn any non-linearity which is present in your real life.

So most or all of the examples in our real world are nonlinear in nature. So if you are putting any activation function, which is not learning this non-linearity then what is the use of having this step function your neural network? So that is the main reason why we don’t go with step functions and we have some other variants which you can see A sigmoid, hyperbolic tan use swish function ReLU and it’s variants.

So there we can use activation functions and you can achieve your activity in a much better way. So well, that was all regarding the activation function. That is the very first step function in deep learning.