# Vectorization of Neural Nets

Vectorization of Neural Nets – The topic for today’s article is vectorization in neural networks. So in the last article, we saw what was weights and biases and we try to represent the Matrix form so that our computation becomes easier. So the basic ideology is that we just backpropagated each and every particular layer that is continued a particular neural network, and we just go to the individual or precise unit or muon and then we update the weights. So this application is basically done with the help of gradient descent. So I have covered gradient descent in separate articles. So you can just check that article.

So basically what we have is we have this equation. So we have the output and one particular layer is represented as W and we have the output that is L minus 1 that is from the previous layer and then we have the bias at that particular layer. So now in this particular equation, which is of the form Y is equal to MX plus c. So in the linear regression what we had is we had the cost that is Jay that was dependent on the slope and The Intercept so here is in the similar case.

We have this cost. We take G as the cost function and is dependent on the weight as well as the bias. So our major goal when we learn you and work. Our major aim is to just update the weights and devices which were initially assigned randomly. So we essentially are concerned with this bait. Devices inputs that we can’t do anything that we just process in particular steps.

We get the resultant output of that so we can’t make any hold on this particular weight. So essentially what we want to do is we want to learn the best parameters. So the best parameters that I mean is the weight and the biases, so that’s where our cost comes reduced. So essentially the cost of this particular neural network can be represented as say you have 1 upon M where m is the number of training examples and by-and-by cap. So this is a lost function. So this is the cost function which is represented in terms of the loss function.

Now, if you remember you have this linear regression, so in that also you have this similar setup. So where I equal to 1 to n y- y predicted Square So we’re in linear regression you had this kind of equation nowhere you have this kind of equation, but the problem here is if you want to take the differentiate Devore if you want to perform any particular activity, you need to have this in representation for the convex form. So how would you represent that so you can basically write this as A combination of log loss equation? So how this is expanded.

So it is given as y log y cap plus 1 minus y log 1 minus y cap. So now this is a particular convex equation, which is same as the one which we have seen in the linear regression. So now this becomes differentiable, so whenever you take the gradient descent it Differentiable is for updation of the word. So I will just give a quick glance of what gradient descent is. So since our J. That is the cost is dependent on two factors wait and the bias so just we’ll take one particular parameter J with respect to weight. That is G versus where’d you get a graph something like this?

So basically our minimum is at this point. So where the error is 0 Error is equal to 0 is at this point. So our major goal is that we get the minimum error. So basically we start with that one particular point. So say we have individual points like this and we cannot take one unit at a particular iteration. So then that becomes like if we have to say, for example, we are reached at this point.

The feet are taking one unit distance, then it will just miss this particular point so we don’t want that to happen. Essentially what we take is Then we just progress towards the bottom. We take small baby steps. So whenever you start with a longer distance you just at each particular iteration. We will just reduce that particular length so that do converge to this particular point. So how this is known to the particular algorithm. It is learned by this parameter called the alpha learning rate. So how much it has to learn in its particular.

Iteration, so this essentially can be assumed like since small kids are there when they learn to walk in the first few months, they will take small baby steps. So then they gradually learn to walk so similarly is the case with here. So initially you start with a longer distance then gradually As you move down so you just reduce the slope. So say consider you have a bridge like this. So we’re in this case if you consider this Loop.

This slope is equal to 0 so any car again moves easily upon this but say if you have a bridge something like this, then what the car does is it will take smaller accelerations and it will apply brakes and breaks and breaks and then it will reach in this particular point. That is the same case here. But here you have a more stiffness. So thereby you are doing the bits. So how do you change the weights or how do you learn the weights? So that is given by this equation.

So you just take the partial derivative dho J by dho W by dho represents the partial derivative and you have the weights that are weight is given by weight minus Alpha into the J by dho W. Now to differentiate this Jay has parameters in W and B, so that it can be easily differentiable. So for So the beads you need to adjust this so say at this particular Point your slope is Positive and your slope is negative. So what do you need to do this? You need to just go down the valley.

So essentially if Alpha is positive and this particular flu is also positive then you’re essentially going into the downward Direction. So then gradually your weight gets reduced and you converge to one particular Point similar is the case with your buyers as well. So for this you have this be bi is equal to VI minus Alpha into dou J by dou P A. So for each of the biases and each of the beads for your particular neural network X1 X2 X3 X4 in the previous setup.

We saw we have these connections. We just build this neural network. So accordingly for each of the beads. You need to add this so this updation is done with the help of gradient descent. And when we move forward what we have is a forward propagation and for coming back to each of the layer which is a bit difficult activity we do backpropagation. So now what I want to do is I have two graphs like this. So this is with J with respect to W. And this is g with respect to be unit. Similar graph.

That is a two-dimensional graph. Now if I want to just Club this together, so what I have is I have wait I have bias AB J. So in three-dimensional space I can observe that I can get a parabola like this. So which looks like a bowl-shaped curve. So something like this you can observe so from each of the dimension you can observe like you can see this particular figure which is appearing. So that’s nothing but a parabolic curve something like this.

So where your minimum is at this particular point that you try to achieve here that is this particular point. So essentially what we do is we do the vectorization so each and everything. So if you consider this weight equation, or if you consider the application for the bias, so everything is considered as the vector so say, for example, this weight is represented as W 1 up till WN so in terms of Dimensions, that is a 1 cross n Matrix and it is given as Same so this is the new one and this is old with the disc 1 cross n then you multiply so this is basic learning rate so that you keep it as 0.1 or any particular value that is kept by the data scientist.

And then you have this also so this dho J by dho W, 1 du J du W to up till do JW n so these are Also vectorized so you can see everything in this particular equation has some or the other form so every Dimension is equivalent, so you can do the particular activity, which is carried out in the particular direction. So that is how we do the vectorization of the neural network. So it becomes easier for our calculation. So well, that was all regarding the vectorization of the neural network.