The Chain Rule – The topic for today’s article is the chain rule. So you all might have come across this particular equation or this particular term in your lower standards it is when you were in your Mathematics High School or somewhere. So but in deep learning is the most important concept when we have to deal with the backpropagation, So backpropagation is one such algorithm or technique where you try to backtrack to individual layers and you try to come to that particular unit or neuron in order to adjust the weights.
So there you basically use this equation or this concept from mathematics. That is the chain rule. So we’ll be having a quick revision of what the chain rule is in this particular video. So say for example Simple you are given an equation like you have some exponential equation E raised to the sine of some function say x square. So now you have this particular equation and it’s been asked that you have to take the total derivative of with respect to X say, this is some function say Y and Y is equal to is given like this and you want to take the derivative of this particular function.
So how would you basically do this? So for in order to perform this, you need to have to learn the equation so the derivatives of certain functions like implicit functions explicit function derivative of many things so that you come across when you do certain types of equations or when you have certain kinds of activation functions, especially their this particular equation comes into the picture. So now how would you take the derivative? So whenever if you want to differentiate y with respect to X now, you want to check whether where the X comes in this particular equation.
So now this works in the format of Of derivative of an outer function multiplied by derivative of inner function So it is a kind of nested functions. So these are essentially called as composite functions. That is function inside function. So how do you take the derivative so for the derivative for E raised to X now this particular one particular equation that is you can consider it as e raised to t. So when you differentiate you first write it as e raised to sine x square. So for E raise to X or E raise to D. You have the same form. And then you need to take the derivative of that is d by dX of sine X Square.
So till you reach the derivative of x squared e to differentiate it this so that is the each individual particular unit. You are trying to differentiate so that is you have e raised to the sine of x square into if you take the derivative of sine of x squared that becomes cosine of x square now, you should not stop that is you have to take the derivative with respect to x square also. So now this is pure in terms of X. So finally your derivative would be for this you have 2X. So now if you just observe you have each and everything as individual units, which are just multiplying.
So these are essentially you’re seeing of different equations that you are multiplying. So essentially these are used for Composite functions so Whenever say you have a function like you are giving Y is equal to x square. So now say I’m representing this as Y. So we have taken x square as y now. We are taking Z is equal to sine y. So now this became sine of Y. So we are taking this sine of y as z and say we have W is equal to e raise to z. So now if we are asked to find the derivative, of course, we are considering here the partial derivative of with respect to say we want to take the derivative of w with respect to Z.
Now in order to do this this W should have something that is expressed in terms of x square. So in order to differentiate this, there is some term there should be something in blue that is representative of X then only you can apply the chain rule else. You cannot go back and connect to that particular thing. So essentially when you take the derivative so first it will be dou W by dou z, so that is with respect to Z.
Then you have dou z by dou Y and then you finally have dou y by dou X. So if you see just this particular equation this consists of this dou Z and dou Y so basically you can just assume like this cancels out, but in mathematically we cannot cancel this but just for Simplicity, you can consider this. So how you represent.
Are you can find the chain of different equations that are coming across has given with the help of this chain rule now for the general format say this is represented as F of G of x. So if you want to take the derivative that is we have y Prime is equal to f’ G of X into the derivative of G of X. So that becomes a composite functions.
Now this not only goes with this exponential functions. Like you have some functions. So 3x plus 1 raise to 7. So are you using the derivative? Also if you have some logarithmic functions like 5X Or if you have some square root functions like this, but we are familiar with these kinds of equations. Right? So these are nothing but your loss functions that you encounter.
So if you have this y & y predicted, you may have some kind of equations where you have y log y cap plus 1 minus y into log 1 minus y cap so you can differentiate this and similarly. This is nothing but your x square plus y Square which is nothing but your L2 norm and so all these are basically convex functions. So this is the beauty of the chain rule comes into for differentiating in Neural Networks. So you can essentially take the derivative of this and you can backtrack and to update each and every particular individual unit. So well that was all regarding the chain rule in deep learning for backpropagation.
reference – The Chain Rule
Web enthusiast. Thinker. Evil coffeeaholic. Food specialist. Reader. Twitter fanatic. Music maven. AI and Machine Learning!