 ## So, what is an ADALINE network?

ADALINE stands for Adaptive Linear Neuron, It is also called Adaptive Linear Element You can see why its called so. This ADALINE network was developed by Professor Bernard Widrow and his doctoral student Tedd Hoff at the Stanford University in 1960. It is based on McCulloch Pitts neuron and consists of weights, bias and summing junctions. So lets look at the structure of ADALINE.

A single layer ADALIINE consists of inputs, a summing junction, where the weighted sum of the inputs are taken. where x0 is the bias and is always equal to 1. x1 and x2 are the inputs. w0, w1, w2 are the corresponding weights. Now to calculate the error, we take the output of the summing junction and subtract it from the desired output given by a supervisor or we can call it a teacher.

Therefore this is the error generator and here we calculate the error Now, this error will be used to update the weights w0,w1 and w2. Therefore this error at the output will be fed back to the network. which can be shown here. We have fed back this error to the network so that the weights are updated. Now note that in practical circuits, we often have a hard limiter at the output to prevent saturation.

Usually this hard limiter is called activation function. The most common activation function used for ADALINE is signum function. One thing you should keep in mind is that, even though the output of the neuron is taken after the activation function, for the calculation of error we take the output from the summing junction.

This is an important point and it distinguishes the ADALINE network from usual perceptron. Now, I have said that the error from the error generated is used to update the weights. But how do we do that? For that, we have the LMS algorithm. LMS algorithm also called Least Mean Square algorithm. So lets see what is LMS algorithm. From Δ rule, we know that where this E(w) is the cost function.

It is also called Energy function. So lets first define an energy function. E(w) where w stands for the weight vector where e is given by where x is the input vector and x may consist of many input elements x1, x2 etc to xn. Similary W is the weight vector and W consist of w1, w2 etc to wn. Or it can start from x0, x1, etc to xn. and w0, w1 etc to wn.

Now to find We need to differentiate this quantity. We can calculate δE/δw from this quantity. Therefore because differential of X transpose will be X. Therefore this quantity becomes. We have to use the vector sign because we are taking the X vector. Note that this quantity we calculated over here is just a gradient estimate. Why? because we are taking the instantaneous vale of E and calculating the gradient using that.

We know that this is not the correct method. To calculate the actual gradient, we need the e vector which consists of e1, e2 etc to en. Therefore in order to show that this is an approximation of gradient, let me write it as We are using this cap(^) to denote that this is a gradient approximation.

Now, the weight updation rule is given by and we know that. From here, we know the approximation of δE/δw which is -eX we are using approximation. Therefore, substituting it here, we get. So, that is the LMS algorithm.

One might easily confuse it as gradient descent algorithm but keep in mind that this is not equal to gradient descent because we are estimating the gradient and since we are estimating it, even the weights computed are estimates. Thats the reason why we are using caps(^) or W^(n+1) and W^(n) to indicate that they are estimates. Also in the previous video about gradient descent, we have seen that we are getting a definite trajectory in the case of gradient descent.

However, since here we are approximating the W’s, the trajectory may not be as good as the gradient descent. There will be some randomness associated with it. So if you draw a contour plot, let this be W1 and this be W2. These are the contour lines. Unlike in the case of gradient descent where there was a definite smooth trajectory, in the case of LMS algorithm the trajectory may be very sluggish.

So we can say that even though the LMS approach is in tune with the gradient descent, it is not exactly gradient descent. For this reason, it is also called stochastic gradient descent. So you can see that, LMS algorithm is the practical implementation of gradient descent algorithm. So that is all about LMS algorithm and lets move on to the next topic i.e ADALINE vs perceptron. The only difference between ADALINE and perceptron is on the basis of how the weights are updated. In the case of ADALINE networks, in the learning phase, the weights are adjusted according to the weighted sum of the inputs.

i.e the sigma(Σ) is calculated and the error is calculated on the basis of So this is ADALINE network. However, in the case of standard perceptron, the net ΣWX is passed through an activation function and the activation function’s output is used for adjusting the weights. Therefore here error is calculated on the basis of d-y where y is the where phi(Φ) is the activation function.

So, i hope that everyone is clear about the difference between ADALINE and perceptron. Now we will move to MADALINE. MADALINE is a multi-layer network of ADALINEs. MADALINE stands for Many ADALINE where the M stands for Many. So going to the standard definition,

So if you look at the structure of MADALINE, it is something like this. There are input neurons. This is the input layer and there is a hidden layer consisting of ADALINE units. and there is an output layer which is also made of ADALINE units. So thats all about this lecture. If you have any doubts, please ask in the comment section. If you like the video, please subscribe to channel and also thanks for watching! Have a nice day 😀