*Generalization of Neural Networks* – The topic for today’s article is how we can generalize a particular neural network. So in the last video, we saw how we represented weight in a matrix representation so that our computation becomes very easier for our Computing engine. Especially our Tensorflow or Pytorch. So in today’s article, we’ll be covering what biases and where it is particularly represented that we have not covered in the last article.

At the first hidden layer, and then we have two neurons and output neuron. So then what we get is we have the inputs that are receiving to this as this wave in this way in this way and to this, we get like this. We get like this and finally, they get like this and our output would be some via estimate. So we know this part becomes our input layer.

That is il and this is are. Hidden layer That is the first hidden layer. And this is our Second hidden layer, this is 2 and then we have a final output layer. Now, this is used for binary classification. That’s why we have only one neuron else if it’s for multi-class classification, then we can have a number of neurons at the output for representing different classes. So what we said is say, I’ll just write one particular equation say we have two parts of this neuron representing for simple linear computation.

And then we have some sigmoid function that is the nonlinear function for transforming it into some nonlinear equation. So in the real world, not everything is linear in nature. So if everything was linear then each and everything in the world would be easier. So we have some relationships which are not linear. That’s why we are mainly using this nonlinear function and it also helps a particular neuron from collapsing so it can be understood with the analogy like the human brain has both left and right brain concept.

Nautical so that is the same logic. So we essentially right one output save we have Z1 at layer 1. So we are talking for this that is whatever comes between this summation and all between these inputs. So we have X1 plus X2 plus X3 plus X4. So along with this, you have dates as well. So that is the very first when it is getting and from the first input it receives. So if you try to remember in this way, then your equation will not go wrong.

Now, what comes our function that is for transforming. So you apply this and you get some output to say that this represented as a 1-1 at this output from this particular unit. Now if we remember like we have talked about in linear regression that this we try to fit a line that is y is equal to 1 Felicity So where the slope, in this case, is represented by the weights. So that represents the strength of that particular neural connection. So this weight or the strength of the correlation can be represented within the range minus 1 to 1 so your weight can be negative and also it can be positive.

So it represents how strong your neural connectivity is. So now what we need to have this we need to have a constant. We need to have an intercept so our error function when we differentiate this equation or any equation will depend on our weights as well as the bias. So now we don’t find any bias in that. So bias does not depend upon input. So what do you mean by that?

To say this is our neuron and we have our linear function and nonlinear function. We basically have x 1 and x 2 and we get some output so we have this bias representing at this particular portion. So for each neuron, there is one and only one single bias. So when you have weights CW WW1 and WW2 What goes inside this particular neuron for computation are be into W 2 so it is a product of the linear combination of bite bias and the wait. So all those transformed will give you a final estimate that is y cap. So essentially we can see there are 1 2 3 4 5 & 6 so accordingly, we can have six different biases.

So say the bias is represented as b 1 a First layer B2 at the first layer B3 at the first layer similarly You have b 1 and the second layer be to add the second layer and you have b 1 and B third layer. So now our equation becomes so far this particular unit. You add a bias constant.

That is this one similarly when you do the second computation for the second unit you have I from b to form the first layer be 3 for the first layer and so on. So essentially what we can have is say we have this Matrix. So we represented our Matrix that this output from the first layer that is from all these things. We are represented as we had a column vector.

So this was a column Vector that was the output from the first layer so our bias says if you are representing B1. So our bias would also be like this B3 angle. So this is also a column Vector. So now each and everything that is our weight and bias and all the inputs. Also, we are representing in Matrix representation. So that becomes easy for calculation. So that becomes essentially are tensor quantity and matrix multiplication, which is done across the neural net.

Look, so now what we try to do is we need to generalize this particular equation. So say any particular output at any layer. We are representing it as capital Z. So that is given by the rate at that particular layer and we have the inputs now inputs should be coming from the previous layer. So input for this particular hidden layer would be coming out of this. So essentially the inputs that this is getting would be you have a 1 1 1 you have a 2-1 a 3-1. So this is the input.

So essentially we can generalize it as L minus 1 And then you have the bias at that particular layer. So this is the generalized equation for representing any neural network. So this is equally of the form Y is equal to MX plus C. So where your particular neuron will compute different weights and it will adjust each and every way that each particular layer your bias will basically adjust that particular computation. So essentially If you had something like this, so this is for the case of curve fitting linear regression.

So by is essentially represents where it has to fit so assume like you have this particular structure with some wireframe and you have some units so initially you build some particular structure and then you are tried trying to adjust each and every neuron so that shift in each and every particular neuron up or down is done with the help of this bias. So essentially you can So see geometrically like how you can shift this either-or either down.

Now before considering the bias, if we were trying to build a matrix, so you can essentially say that the rate at this particular connectivity between this hidden layer and between this input layer so that weight Matrix say Weight Matrix of w and what would be the dimension of this? So you essentially say like it is a 3 cross 4 Matrix 3 units and four For this it will be 2 cross 3 and 4 the third that is the connectivity between this you have one cross two.

So earlier it was 12 plus 6 plus 2 that is you get a summation total of 20. So by adding the weights at each particular layer weight will be you add 3 to this. Plus 3 you add a 3-2 this here, you will be having two and there you will be having my so that is essentially 20-plus 626. So essentially by considering the gates, you will be learning not only the weights but also the bias is so essentially you need to have the idea of having how to adjust the weights and how the neuron should be there.

So essentially that becomes the question of General Vision so if you have a large number of layers and thereby can have a large number of neurons and this essentially will shoot up. This number will not be 26. But in the real-time, it will be any higher number. So you need to learn all those parameters to generalize your particular neural network.

Now essentially generalization is affected or you do the generalization of any particular neural network-based upon three different factors. So we will just quickly summarize those factors. So the generalization is affected by three factors. So the very first factor is the training set size. So we build this particular neural network with the help of our drilling set. It is a normal case when we do building with other models as well.

So that depends essentially upon your training set say we are representing by capital n Second is that you consider the architecture of your neural net. So this is also one major important factor when we need to generalize your model save, you initially have built a very large kind of neural network and then after certain epochs now a pop means how much you use your training set to do one particular computation of your neural structure or do the entire processing and then you find that you need to To remove some particular neurons or which are not adding value to your particular computation.

Then you need to re generalize your model. So they’re the architectural neural network becomes very crucial. So essentially you took some huge amount of time then you build on your own and over again. Then you find that these things are these modes are out essential or these connectivities between these two are not required then it takes a computation amount of time to remove that so it becomes difficult.

And second is the Problem Complexity or the complexity of the problem at hand So the third problem is not or the third factors affecting the generalization is not given that much attention by because you’re doing this why because you want to have like to solve a large problem. So that’s why you can’t blame like your problem is complicated. That’s why I’m not building.

Its neural network is built for that mainly for doing your computations easier. So essentially you just See, there are only these two so you can achieve the generalization by keeping either to of this constant. So by keeping the training set size, you can generalize the architecture of your neural network. So if it is found that if you RN is said fixed.

That is the number of training set size for building. Your neural network is fixed. Then the complexity that is Big O notation is given by this quantity It is. W represents the weights or the number of Free parameters Now free parameters are all these 26 parameters in this case. There can be any number like 100 150 200 250. So those are the number of free parameters that is the weights and absolute represents the permissible.

Error, say like you have 0.01 % error. So only this much percentage of error is like assumed to have in this particular setup. So thereby your complexity will reduce so like I said before if you want to remove particular connection or particular neuron, then this equation would suddenly change. So that’s the generalization of a particular neural network will affect significantly. So well, that was all regarding the generalization of a neural network in deep learning.

**Useful links:**

reference – *Generalization of Neural Nets*

Web enthusiast. Thinker. Evil coffeeaholic. Food specialist. Reader. Twitter fanatic. Music maven. AI and Machine Learning!