# Softmax, Cross Entropy: Image Classification with Pytorch Softmax, Cross Entropy – Hello everyone! Welcome to part two of the image classification with Pytorch series. In this article, we are going to continue our project by explaining the softmax and cross-entropy concept which is important for model training.In model training process, First, we need to construct a model instance using the model class we defined before.

Then we need to define a loss function. In this example, we are going to use cross-entropy as the loss function. This may sound intimidating at first, but don’t worry about it. I’ll decode how to calculate this cross-entropy loss step by step. After getting the linear weighted sum scores of the network, we need to feed it into the softmax function.

Softmax function will normalize the input scores between the range of 0 to 1, which can be interpreted as a probability. In this way we can directly compare it to the one-hot encoded vector that corresponds to your labels using cross entropy. This is the math definition of cross-entropy. Notice that the log function can’t take 0 as an input, however, there are a lot of zeros in one-hot encoding, so what we can do?

We have to make sure that the probability in the log function must come from the output of the softmax function because mathematically softmax function will never produce zero. Now, let’s see why we want to optimize this cross-entropy loss. For this one-hot encoding vector, where we have 1 when i equal to 3 and 0 for other classes. The cross-entropy function can be expanded, where we only have one term left, when i equal 3, all other term disappeared because of Li equal to 0. So we only need to focus on one term on the model probability outputs when i equal to 3.

Now If we want to minimize cross-entropy loss, we need to make log(S) term as small as possible, looking at the log function plot here, S needs to be very close to 1 for log(S) to have a small value. In such a way – by optimizing cross-entropy loss we can force the model to optimize its parameters to make the right prediction.

So, that’s the reason why we use cross-entropy as loss function here. And in the next article, we are going to talk about how to use backpropagation to optimize model parameters with this loss function.