Text Classification – So far in this articles we use TensorFlow API to build model for image classification and regression. In this article, we will demonstrate how to use TensorFlow API to build model for text data. In this module, we will use TensorFlow API to classify the movie reviews into positive or negative reviews based on internet movie review data set.
So, we have 50000 movie reviews in total, we use 25000 reviews for training and the remaining 25000 are used for testing. In this example, we will use transfer learning with tensorflow hub and Keras. Let us install all the required libraries and tensorflow version 2.0, we use numpy for data manipulation and we use tensorflow hub for pre-trained models.
One of the beauty of neural networks is that we can use neural network models trained on one particular task, we can reuse that for some other task. For example, a model trained on image can be used to perform classification of some other images. So, this is called as transfer learning, where the tensorflow, the train tensorflow where the model that was previously trained on some data set called as pre-trained model is used as a black box in some other model.
We can see that TensorFlow 2.0 is now installed. Now, we will download the imdb IMDB dataset and split that into training and validation split of 60 to 40 percent, 60 percent training 40 percent into validation. We are using TFDS or TensorFlow datasets load method to load the IMDB reviews from the internet. Let us print the first 10 examples to see how the data looks like.
We do that with the batch function. So, you can see that each review is on a single line and so you can see that there are 10 reviews over here, each review is in a single line and there are labels in the label batch. So, you can see that most of the reviews in the first 10 reviews are positive and except for couple of them which are negative.
This is also 1D tensor with shape 10 comma. Now, that we have explored the data the next task is to build a neural network model. There are 3 main decisions when we decide to build a neural network model. The first decision is to figure out how to represent a test, how to represent a text; how many layers should we use in the model and how many hidden units should be should we use in each of the layer.
In this example, the input data consist of sentences and the output label output label is binary which is either 0 or 1, 0 represents the review is negative and 1 represents that the review is positive. So, one way to represent a text is to convert the sentences into embeddings vector. This is where we can use some of the pre-trained text embeddings at the as the first layer.
This has got multiple advantages. We do not have to worry about text processing. We can benefit from transfer learning and embedding has a fixed size. So, it is simple to process. So, we will use a pre trained text embedding model from TensorFlow hub. Let us look at what TensorFlow hub is. TensorFlow hub has a number of reusable neural network models that can be used as block black box models in other applications.
We will use, we will use a text embedding model based on google news which embeds a given sentence in a 20 dimensional vector. There are other embedding models that are also available on the TensorFlow hub, but for this particular exercise we will use a google news based text embedding model, we will use google news based text embedding model.
Let us create, let us first create a Keras layer that uses TensorFlow hub model to embed the sentences. So, we can define that using hub dot Keras layer, we specify the model that we are using for embedding by a URL of that particular model. We specify the input shape, we also specify the data type that is string and we specify whether the model is trainable or not.
So, in this case you want to retrain the model that is why we said re trainable we said trainable to be true. And what we will do is we will take first 3 examples and see what happens when we pass these examples through the hub layer. So, you can see that as we pass these 3 examples through the hub layer we get a tensor which is a 2D tensor which has got 3 examples and each example is represented by a 20 dimensional vector.
Each vector is a real number either positive or negative and a and is and each number or each entry in the vector is a 32 bit floating point number. Let us build a full model and let us see how to you this particular hub layer inside the full model. Now, you are quite familiar with the sequential models in Keras. So, we will add the hub layer as the first layer to convert the sentence into the desired embedding.
Later we will take the output of this particular embedding and give it to the second layer which is a hidden layer with 16 units which uses relu as an activation function. Finally, we have an output layer which is a dense layer with a single unit because we have binary classification problem here and it uses activation as sigmoid. So, let us quickly look at architecture of the model.
So, what we do is, what we do is, so we have this is the text embedding model, takes a review as a input and it gives us for every review, we give it we get 20 numbers from this. And then we send these 20 numbers to a hidden layer with 16 units and it uses relu as an activation function and then it goes to a single unit output layer with sigmoid activation which gives us y which is either 0 or 1.
So, we get take the text you pass it to some kind of an embedding, it gives us 20 numbers and these 20 numbers going to the 16 numbers. So, this particular part we define by the hub layer and then this particular part is essentially the sequential, is essentially a dense layer and it is another dense layer which is an output layer. I hope this makes the architecture clear to you.
Now, that we have built a classifier let us run this and see the summary of the model. So, we can see that there is a keras layer which is an embedding layer which outputs 20 numbers that output goes to the dense layer which output 16 number and then we have an output layer which outputs a single number which is the prediction. And the number of parameters in the keras layer are about 400 k, then 336 parameter in the first in the hidden layer and 17 parameters in the dense layer.
We can clearly see that number of parameter since 17 because there are 16 inputs and 1 bias term that makes it 17. In the same manner, you can three, you can see this 336 comes as 16 into 20 plus 16 bias units corresponding to each of the unit in the hidden layer that makes 336, and then keras layer has 400 k parameters.
So, each of these units in keras layer has 20001 parameters per unit. Out of this 20001 parameters there is a parameter each for a word in the vocabulary and an additional parameter is used for out of the vocabulary words. So, that makes it about 400 k parameters in the keras layer. So, we have 400373 total parameters to train.
So, now that the model is defined let us compile the model. We use adam as an optimizer, we use binary cross entropy loss because we have a binary classification problem to solve here and we track accuracy as a metric. We will batch the training data into a mini batches of size 512 samples and we run the training loop for 20 epochs. And we validate on the validation set and we store the output of the fit in the history.
We store the output of the training loop in the history variable. Since, we stored it in history variable we can use it later to plot learning curves or any such statistics around the training loop. You can see that the training loss is going down after every epoch and the accuracy is going up. Also keep an eye on the validation accuracy and see what is happening to the validation accuracy.
So, validation accuracy also seems to be going up with each epoch. And training and validation accuracies seem to be quite close. After 30 epoch the training accuracy has crossed 93 percent and validation accuracy has crossed 87 percent. Let us evaluate the model performance on the unseen data set.
We copy the result of the evaluation in the results variable. So, here on the test data we get an accuracy above 86 percent it is very close to 87 percent. So, this fairly naive approach achieves an accuracy of about 80, 87 percent. In this module, we built a text classifier, in this module we build, in this module we built a text classifier using tensorflow API.
We used transfer learning based on models in tf hub. Hope you enjoyed learning these concepts. Hope to see you in the next module.
Web enthusiast. Thinker. Evil coffeeaholic. Food specialist. Reader. Twitter fanatic. Music maven. AI and Machine Learning!