# Structure of a Neuron

Structure of a Neuron – The topic for today’s article is the structure of a neural unit or a small neuron that we use in our neural networks. So in the last article, we saw a gentle introduction regarding what does deep learning and a few of the concepts that we’ll be using throughout this particular Series. So let’s start how a particular neuron looks like. So before we start this article, let me remind you that we are not practically mimicking are Imitating the real neurons in the human brain why because they have certain limitations in making that in today’s world also in this present time.

So we will be considering one particular neuron unit. So instead we’ll be not saying it is a neuron but instead will consider it as a unit so say this is how we represent the unit. So we don’t call this as your own since we are not learning any biological activity just Just we have certain mathematical computations and let’s see how different results are computed. So what we basically have is you have some input that is coming in this direction and you have said some output.

So that is available for any process you provide some input and their Universum black box or white box. You get some desired output. Now, this particular neuron is internally divided into two portions. So first on the left-hand side, you have something called a linear function. Just nothing but your summation And then you have something called as a nonlinear function and LF. So here say, for example, we are using a step function will come into later why we cannot use a step function.

So essentially in the input say we are having some input say X1 and X2. And with these inputs, we provide some kind of weights. So these weights are essentially called as coefficients or this represents the strength of your neural connectivity means this is the link that is going through this particular neuron. So how much weight does this particular input has that is represented by the weight, which is shown over it. So inside this instance, a linear function, what output do you get as a result of this is W 1 x 1 plus W2 X2 So this is nothing but your normal Y is equal to MX plus C that is for linear regression.

So since we are not talking about the bias will just come into this intercept a bit later. So what essentially have is in the first part you have this linear function. So it will do the product of the sum of all the weights multiplied by the input vector and then it will pass on to the nonlinear function. Now in the human brain also, we have just left brain and right brain. So that is part of some conceptual and some for illogical. So essentially in the real world.

Also if you have only the conceptual ideas and if you require some logical function, it is to be implemented in the real world, then it becomes difficult for a human being also so essentially that is the case with the neuron, so it needs to have both the linear function as well as nonlinear function else if it is only with This left brain or only the linear function then that particular neuron will collapse so nonlinear function. Heads any neuron From collapsing So this non-linear functional f is also for Less sometimes the activation function.

So basically whatever it comes from this particular left-hand side, it will just activate that according to some predefined function which you put so there are different kinds of activation functions like sigmoid you have time image. You have relu leaky real you and many of them. So bye. So after computation, you get this particular output now, let’s talk a bit about this.

The function is the step function. So since we know like I know our gradient descent we have this particular J, which is a function of the weights and biases. So since we know about this cost function from grading descent how it looks basically is something like This in three-dimension you obtain a bowl shape function. So we’re on this axis.

You have this error you have this B, and you have the wait. So what you do is you basically calculate the gradient and you start from this point to this point this point and you Converse at this particular step where the error is equal to 0 so, this is our major goal whenever we do gradient descent or this is also implemented in this neural networks. Now our main problem is if we have a step function if you just closely observe, it has some breaks or breaking point at these two places.

So they, However, we take the derivative of this. What happens is that our neural network will fall to the local minimum. So instead word should be a solution is we should provide something like that. Our function should be smooth. So for that what we basically use or for all of these default neural networks. We use a sigmoid function. So sigmoid function we have already learned. This is not new for us. So since in logistic regression, we have this law get the function that is nothing but the sigmoid function so for which say We have 1 upon 1 plus e raised to minus Z.

So that is given as the output function for this. So the activation function in most of the cases used as logistic regression that is for a sigmoid function. So we computer sigmoid say our output is represented as Z. So it just modifies this in this particular way. So essentially what our curve now just will look like is so since we had our original function like this. So what is the problem is that if you try to fit a line in this what it will have is you can just extend to infinity and positive infinity negative Infinity so – and positive Infinity but since you will know all these things are used for probability representations probability kind P pulled into a negative and positive Infinity instead.

It should be confined to some interval between 0 or 1 so it should be a fuzzy kind of thing. So instead what we do is we try to squeeze this and say we have a threshold some barrier and we try to squeeze this at some particular threshold. So say this is a 0.5 and for all values below this 0.5. We classify it as 0 and for all above we classify it as 1 so it attains 0 at minus infinity and a dense positive 1 and positive Infinity. So this A sigmoid function that we basically use and further.

If you want to take the derivative of this then this becomes handy where if you use our step function, then that will fall prey to local Minima. Now since in this case, we have considered only two inputs. That’s not the case in the real world. So you have the linear function nonlinear function. You have x 1 you have x 2 FX 3 a collection so you can Have any number of inputs to this and accordingly you have weights w n so despite providing several inputs the processing of your neuron does not change. So that has no impact.

So regardless if it’s two inputs are n number of multiple outputs your neuron would perform the same way how you put your linear function and your nonlinear function whether it’s Sigma dot or anything else. Now essentially since we are using this activation function as our sigmoid, since we know these are our inputs and these are the weights provided that are the connectivity strength to each of these inputs.

We can essentially represent our Z in this form. That is W 1 x 1 plus W 2 x 2 plus a pill W NX n so what are sigmoid function will do is at the end it will just To consolidation of all These weight functions and it will do or it will transform in this way. So irrespective of you do all of this computations your final output would be one single value that is either 0 or either one or you can have any number of probability values.

Now, let us take another scenario say you have some inputs X1 X2 X3 and X4. So now since we know what does the neuron will just dry to build a neural network? So say we have Three inputs here and say we have one output. So what do you see basically here? So this part is nothing but your Input layer, I’ll write it as IL then what you have is here. That is the output layer. I’ll write it as ol it gives some output say y cap and what you can see here in this middle portion.

This is your hidden layer. So when you construct the neural network, you can have any number of hidden layers. So hidden layer are those layers which are explicit of your input and the output layer. So whatever you subtract from this portion or the whatever is there at the end you just remove this or you just separate this whatever is left inside or between this input and output layers, those constitute your hidden layer. So hidden layers can be any number.

The hidden layer can essentially be called are hyperparameter. So you depend upon your application scenario how you want to tune your hidden layers or how many connections you want to have or how complicated you want to build your neural network? So what basically happens is that this particular neuron, so these are individual units. So to this individual units, you get input from this to this to this. So with this what we basically have is you have weights. So this is coming from the first input. So similarly have W12 W13.

Similarly, with the second input, you have like this so that this W to 1 W 2 W 2 3 similarly you have w31 W3 2 W 3 3. W41 W4 to NW for three so just for only these two connections or for having a network between this input layer and this hidden you basically have that as 4 into 3, that is 12 number of Weights. So just for only these connections that is you multiply each of these inputs with the weights.

You have 12 number of For this particular connection and when you have this here You basically apply your sigmoid. So say this is output from the first so that is Sigma Z 1 this is Sigma Z 2 and sigma Z 3 And also you have to say wait something is here. So that is w w – and W double Prime. So essentially your output function would look like w into Sigma Z 1 plus W Prime into Sigma Z 2 plus W. Double Prime into Sigma Z 3 So the resultant of this goes to your sigmoid function that is 1 upon 1 plus e raised to minus this quantity.

So that will just squeeze your particular value of all these calculations. So now you can assume like how many computations are there if you put several different layers so assume this is the case for convolutional neural networks where each of the inputs represents some pixels or some pattern so then you can have Since you have only one neuron, this is bound only for binary classification. Why binary classification by because we are deploying a sigmoid function and it has two values that are between 0 and 1 so this value that is Sigma Z ranges between 0 and 1. So this is the interval where it will just put the value.

So it’s like saying whether I should go outside today or not. So that is again a classification binary classification. Yes, or no, so wherever you require Is no proof all zero one kind of kinds of stuff or if you want to predict that or if you want to model that in the neural network, you basically go with binary classification by using sigmoid function else. If you have said some image classification, whether it’s apple or banana or orange or grapes some fruits, so that is essentially a multi-class problem.

So with a multi-class problem, you cannot Deploy, this particular kind of neural structure since you have only one output neuron here there you can have different neurons representing each of these different classes. So that’s all multi-class problem. And since we have not talked about the bias, so bias will be there for each and every neuron. So since we have 1 2 3 & 4 neurons, you will have bias man bias two wires 3 and bias Force So with the Twelve Gates you will have an additional four that is you will have 16 total of quantities that you will be learning.

So essentially the major goal is when you get the output whether the classification is wrong or if that prediction is incorrect. You basically just bad drive by using some algorithms called as backpropagation that we’ll be covered down the line. So there you basically learn all this basically weights and you just modify this bed so that you get the correct output. So well, that was all regarding the structure of a neuron in deep learning.