Image classification and Visualization

  • by
Image classification and Visualization

Image classification and Visualization – In the previous article, we studied CNNs we also learnt how to build CNN models with transfer learning. In this article, we will build Image Classification models from scratch and we will use bunch of strategies that are employed in practice while building image classification model and we will also visualize what the CNN is learning by looking at the activations after each layer.

So, we will follow a basic machine learning work flow we will examine and understood where we will examine and understand the data. We will build the input pipeline to bring the data to the training. We will build the model, they train it, test it and then will improve the performance of the model and repeat the process.

We will get some practical experience and develop intuitions for building input pipelines for images using image using image data generator class. We will also study how to identify over fitting and prevent it and we will also learn key concepts like data augmentation and drop out in this exercise; in this session. Let us install TensorFlow 2.0. Let us import image data generator and other libraries like dense convolution 2D flattened Dropout and MaxPooling 2D from keras layers and we will also import sequential for building model.

We use mat plot lib.pyplot for plotting the performance of the model. Let us load the data. We use dogs versus cats dataset from kaggle competition. The dataset as the following structure. There is a top level directory called cats and dogs filtered, then we have training and validation dataset.

Within training dataset, There are two sub directories cats and dogs, the validation directory structure also follows the same. The validation also has two sub directories cats and dogs. Within cats directory we have images of the cats stored in jpeg format and each file has a name cat.id.jpeg and dog.id.jpeg. So, first 2000 examples are used as training and the remaining examples are used for validation.

After extracting the content, we assign variables with proper file paths for training and validation sets over here. Then we construct paths for training and validation directories for cats and dogs. Let us look at how many cats and dog images are there in training and validation directory. We use os.list directory command to list the content of the directory and take the length of this directory listing to calculate the number of cats and dogs in training and validation.

You can see that we are using 2000 images for training and 1000 images total for validation, 1000 cat images are used for training and 500 cat images are used for validation. The same proportion of images are used for training and validation from dog class. So, let us setup some variables like batch size, epochs and the height and width of the image. Let us prepare the data for training. So, we perform the following steps we will first read the images from the disk, then we will decode the content of this images and convert them into proper grid format as per their RGB content.

Then we convert them into floating point tensors and then we rescale this tensors to our; we rescale this tensors from values between 0 and 255 to values between 0 and 1. So, all this task are done by image data generator class provided by tf.keras. It can read images from the disc and preprocess them into proper tensors. It will also setup generators that convert this images into batches of tensors which is very helpful during the training.

So, we setup image data generator for training and validation set. After defining these generators, we use flow from directory method to load images from the disc, apply rescaling and resizing of the image into required dimension. So, here the target image size is 150 by 150 and we want to shuffle the training data. We do not shuffle the data in the validation set, we also specify the batch size and the directories were data is stored.

Let us visualize the training images by extracting a batch of images from the image generator. Let us plot five of these images. We use matplotlib for plotting these images. So, these are some of the images of cats and dogs. Let us create a model a CNN model for classifying cats and dogs. So, we use three convolution blocks with a maxpool layer in each one of them, then we use a fully connected layer with 512 units on top of that with relu activation.

And the model outputs class probabilities based on binary classification by sigmoid activation function in the output layer. Let us look at the structure of the model. So, we use a convolution 2D followed by maxpooling, then another convolution layer. We use 16 filters here, 32 filters here; 32 filters here and 64 filters over here. Reduce binary cross entropy as a loss with adam as an optimizer and accuracy as a metric and accuracy as a matrix to track.

So it will be an interesting exercise for you to understand the output shape of the model and workout the model parameters as you see in the summary. So, I would encourage you to stop a video for couple of minutes and work out this parameters for yourself as we have done as we have done in some other previous sessions. So, you can see that we have modelled with more than 10 million parameters. Let us train the model for 15 epochs. So after 15 iteration we got accuracy of accuracy closed to 94 percent on the training set and 73 percent on the validation set.

Let us visualize the training and validation accuracy across epochs. You can see that as we trained for more epochs, the training the training accuracy kept raising whereas, the validation accuracy plateaued after some time. We also see the same trend in the training, we also see similar trend in the training and validation loss where you found that training loss was constantly decreasing as you train for more epochs, but validation loss initially declined, but then grows steadily after certain epochs after about 5 after about 50 epochs.

So, this points to the fact that the model is over fitting. So, you have to explore strategies to increase the performance of the model by reducing the over fitting. So, we will use data augmentation and drop out as two strategies for fighting over fitting problem. In data augmentation, we will take the existing images and perform certain transformation on them to gather more data.

So, what kind of transformations we do? We can rotate the image, we can translate it, we can change the scale of the image to create more and more examples of the image the model is less likely to over fit with more data. So, let us try to augment the data and visualize it.

We can perform random horizontal flip augmentation to the dataset. We can use ImageDataGenerator with horizontal flip equal to True and let us generate the data. Let us take one example and see how the data got augmented with horizontal flipping. So, you can see that after horizontal flipping, we get different after horizontal; after horizontal flipping, we got more examples that were generated from a single example.

We can also do a different kind of augmentation by rotation. Let us apply 45 degree rotation randomly to the training examples. Now, you can see that the same picture of the cat was rotated in different orientations with different angles and by applying this particular data augmentation strategy, we created more pictures from a single picture.

In addition to that we can also apply zoom augmentation by specifying the zoom range. Let us see how zoom augmentation look like. Let us put together all this augmentations strategies. So, we tried horizontal flipping, we tried rotation and we tried zooming and we will use all this augmentation strategies together to get more images from the set of training examples.

So, we use ImageDataGenerator and apply rescaling, then rotation of 45 degrees, then width shift, height shift and horizontal flip and the zoom augmentation to the training images. So we apply all this augmentation techniques to the trainings set and obtain more data by augmenting the original images. So, these are some of the augmented pictures of a dog where we have a original picture over here and these are the pictures that obtained; these are the pictures that were obtained by applying various data augmentations strategies.

We apply the data augmentation only to the training examples, we do not apply data augmentation on the validation examples except rescaling them. And we also convert the validation examples into batches using image data generator. Another technique to reduce over fitting is drop out.

We had drop out to the model after augmenting the data through various augmentation schemes. We had drop out of point three each after couple of max pooling layers and Dropout of 0.1 after the fully connected layer. The effect of the dropout is that the randomly 30 percent of max pooling nodes and 10 percent of fully connected nodes are set to 0 during each epoch.

We compile the model and train it and we can see that at the end of end of 15 epoch. We have a training accuracy of 64 percent and validation accuracy of 63 percent. And if we plot the losses, you can see that both the training and validation loss is trending together and there is a lesser over fitting than the earlier model. Let us visualize what this particular CNN is learning.

So, we will we will take the output of the model after every layer. So, there are there are eight different layers convolution followed by pooling, then there is a dropout then convolution pooling convolution pooling and dropout. So, we get the output after every layer. So, this is different from the models that we have seen so far. This is the first time we are encountering a multi output model until now.

We have models that were giving exactly one output at the end of the final layer here, we are gathering output after every layer. Let us look at the predictions coming from the activation model and let us plot those activations. So, this is the input this is the first image that is input to us and the first level activation is essentially 150 by 150 feature map with 16 channels.

So, let us look at the third channel. You can see that the third channel is roughly detecting the edges from the cat picture. If you look at the 15 channel, it is identifying probably the overall shape of the image. At this point let us go and plot a complete visualization of all the activations in the network. We will extract and plot every channel in each of our 8 activation maps and we will stack the results in one big image tensor with channels stacked side by side.

So, now you can see that the first convolution layer seems to be detecting edges and as we go deeper and deeper in the network. We are detecting different kind of features from the images. You can see that there are lot of empty spots after the second convolution and max pooling operation.

There are few remarkable things to note here. The first layer acts as a collection of various edge detectors; at that stage the activations are still returning almost all the information present in the initial picture. You can see that lot of features from the input are being preserved.

As we go higher up the activations become increasingly abstract and less visually interpretable. They start encoding higher level concepts such as cat ear and cat eye higher of representations carry increasingly less information about a visual content of the image and increase in the more information related to the class
of the image.

The sparsity of activations is increasing with the depth of the layer. The first layer almost all filters are activated by the input image, but as we go deeper and deeper in the network there are more and more
blank filters. This means that patterns encoded by the filter is not found in the input image. So, this is a very nice way of visualizing how convolution convolutional neural network is learning patterns in the image.

So, we showed a very important universal characteristics of representations learned by deep neural network. The feature the features extracted by layer get increasingly abstract with depth of the layer. The activations of layers higher of carry less and less information about specific input being seen, but more and more information about the target class.

A deep neural network effectively acts as an information distillation pipeline with raw data going in and getting repeatedly transformed into and getting repetitively transformed so, that irrelevant information gets filtered out and useful information gets magnified and refined. Hope you had fun learning this concepts. See you in the next article.

Share this post ...

Leave a Reply

Your email address will not be published. Required fields are marked *