*Softmax Activation Function* – Today I want to see the role of softmax function in deep neural networks. So for that let’s take one example of a data set now since we have this COVID Corona cases, which is the going on so since now this is in 2020 and we have this census that is taken every 10 years.

So what government does is they take or they’ll assign certain government employees from the health department also from the respective groups and these and of people in different regions in different states and different countries to take out the census of different people from each of the villages or citizens on like how many people were there are how many idea how many areas so they get certain insights or certainly so it lays out certain analytical basis for further procedures.

So what we do, The assigned these government officials or employees to go to each of these households or houses. Then what they do is they take the survey. So now what we basically have is they need to things that is the phone number, as well as the postal code on which region they went now for Simplicity what these people will do, is they will just write this phone number so it will be You say some like 49 something something and postal code 39 something something.

So depending upon the frequency or depending upon how they write different groups are different government groups. They have their different way of right? So basically this can be categorized into the problem of Digit recognition. So now we have a data set say we have a census data set. Which is an Excel file and it will have all these so this will be working on the real value or the numerical value data attribute. So this can be summed up like similar to an MNIST data set.

So there we have similar kind of Digit recognition. Now, what we do is any digit whether it be like any combinations of like just will exclude these special characters. This operator country code will not take that into consideration. But any digital from the arrange that 0-9 it lies in this range, so we have this many fields ID and all those we have this phone number and we have this postal code onto this so now.

This will be for our basis for classification of our digit recognition. What that particular digit is. So it may be like the case like say that particular government employee or even a hurry. So instead of writing the 0 like this, he may have written 0 something like this. So you can’t see like any particular kind of loop that is happening. So you can like characterize or distinguish between 1 this is 0 or Is this a six so that becomes a difficulty so for that we mainly go for digit recognition. So now what we basically have is we just take this input data. So this is the mask that we have. So now what we do is we have certain tuples and so on until and we have a deep neural network.

Which will have many hidden layers in between this and then finally we have this output. So which all are is connected in some fashion? Now since we have 0 to 9, we have 10 digits in the output layer. We will have 10 different classes. So that is 0 1 2 3 4 5 and up till nine. So 0 to 9. We have 10 different classes in the output layer. So our basic functionalities, whenever you encounter any particular record, means a couple that is in the relational database we have this tuple record access. So whenever you get this you just classify this output and you will trigger some real fire some output say it has a value like 57.93 or something like this then 623.45 so this basically is the wrong number.

Raw values that we have so wrong values are basically the count like how many of this particular digit is there in this entire data set. So this data set may not be like in thousands or ten thousands, but it may be in 10 million or 50 million. That is the entire world data set. So now we have the census data set. Now what we have is we have this raw values which is given on by this convolutional neural network or deep neural network.

Now what we basically do is we have this raw values now, we can’t lay a particular kind of analysis or significance testing based upon these raw values which we have because you can’t do it for further calculation like you can’t calculate probability values or any such things, so what we do is since we have this input data set X1 X2 X3 and all we have this records like this.

So what we do is we assign a label so y will be the class label which belongs to some capital Y set of Y and we have some labels is 0 1 2 3 4 5 and so on up till 9 now this process. What we do is We will input into a technique called as one hot coding. Now one hot-coding what it does basically is say you have zero that is a digit 0 so you make a matrix kind of representation. So initially, all bits will be 0 so you have 3 3 and 3. So this makes 10 now since if it is the very first that is 0 is there.

You will just enable this to 1. So this characterizes that the digit is 0 next if you have one so it will be something like this. It characterizes that the digit is 1 if it is 2 then it will be something like this. So similarly for all the 10 classes until this becomes 1 we have this Vector. So this is kind of a bit representation you can say but this is not actually bit you don’t flip the bits here, but this is actually kind of a technique which is followed in programming called as one hot coding especially in Python.

That you have four different kinds of data, Many data and the access or the indexing access. If you want to minimize efficient for efficient storage and for processing, especially in neural networks, you mainly go with the one hot coding technique. Now, what we do is we input this to our system or it will compute all the necessary actions out of this now still we get the raw values now, we want to transform this raw values into some format so that we Get some value in the interval like zero to one.

So which serves as the basis for the probability calculations. So for that what we do is we just make this value go into a transformation like e raise to minus RI; I is a number of classes which you have in this so it depends upon the different applications in the area which we have. So what this basically lay is that Since you have this Very high values for this 57.93 or 623.45 depending upon how you recognize the character.

It will have any number of values. So in order to convert this into a very small value, so e raise to minus of this particular number or any of this value is something in the range 0.00 like this. So if you transform this and if you take the summation x 2 summations of all the ri values of e raised to minus ri values will be Be equal to 1 so this is that is the basis for any probability calculation so that we mainly do here. Now.

What we do is now function what will be it will do is it will calculate the relative probability values or the raw Fitness values by taking that the individual value suppose. It is R1 so it will take the summation of all of this. So this is the relative probability. So now at the output instead of this raw values, you’ll basically have this probability values which lies in the range 0 to 1 which can be used for all the statistical analysis and all the other activities. So the outcome or the major goal of why we mainly use this often a function is that to employ one hot coding in our calculation for neural networks, and also to calculate these kinds of values at this.

Frequency of occurrence of each of the digit to some probabilistic value so that we can efficiently do the computations is so well, that was all regarding the softmax function in the deep neural network.

**Useful links:**

reference – *Softmax Activation Function: Deep Learning*

Web enthusiast. Thinker. Evil coffeeaholic. Food specialist. Reader. Twitter fanatic. Music maven. AI and Machine Learning!