Inputs of Neural Networks – Hi. In the last few articles, we looked at a very simplified version of back propagation. We looked at the basic properties of neural networks, and we are now going to look at what kind of inputs neural networks need, and what kind of output can they give us. As you will see, they are very flexible algorithms and this is going to help us in a variety of NLP tasks.
So as a summary of what we’ll be talking about in this video, neural networks and in general deep learning models take vectors or sequences of vectors as their input. These have numbers in them, of course. Neural networks and deep learning models are language models in that they have some knowledge about the relationships between words, and therefore some knowledge about human language, for example our n-grams for language models because they knew a little bit of – if you have one word, the next word is coming up, such a word is coming up.
Neural networks can replicate these properties, and as a matter of fact, can do this even like much better than the n-grams. Neural networks produce outputs that can be classification, for example tell you if a spectrogram is the sound ee or the sound ih. They can also give you word vectors, embeddings for example, or they can give – tell you, oh you – the word you want is this one. This – this makes it so that they’re very flexible, because they can give us the word that comes next in a sentence.
They can give us the word that we are looking for in Wikipedia, they can give us the word in a second language, we give the word in English, and we get the word in some other language, for example. They’re very flexible algorithms, so let’s look very quickly at what kind of input they need. We can provide it in several ways. One of them is to do encoding. In week 5, we looked at a kind of encoding, called ordinal encoding, where we gave each word a number.
For example, hello might be word number zero, world might be worth number one, I’m might be word number two, and then a sentence like Hello world would be represented by a vector with two elements, the element zero and the element 1. So you can see that if our vocabulary had 10,000 words, and hello was the word 5000 at this one this one’s the word 6000, then the vector will be 5000 6000. If you had a large vocabulary so it is just exchanging the word for a number, this is what we did in week 5 with ordinal encoding.
There’s other types of encoding. For example, one hot encoding, is slightly different. Let’s say, you have a vocabulary of five words, only five words in your language. Hello world I’m using Python. If you have a vocabulary of five words, then you could represent any word with a five element vector. And you could dedicate the zeroth element of that – of that vector to the word hello. So if you have a one on the zeroth element, this means that the word is hello, and if you have the wo – the zero in the zeroth element, it means that is anything else.
If in the second element, the element index one, the second element of the vector, you could dedicate that to the word world. See, we have a 1 there, and zero in all the other positions. It could mean that you’re talking about the world, the world, the word world, and not the other words. For example, in the last one, if you’re talking about the word Python, we could dedicate the fifth position to the word Python, so that you have one there, and zero in all other positions. This is called one hot encoding.
The same sentence hello world, would be represented by a sequence of factors. The first one of these would be one zero zero zero zero for hello, and then a second one zero one zero zero zero for world. Why one in the second position? Because this position we have dedicated to world.
This is one hot encoding. And again, if you have a vocabulary of 20,000 words, you would just have a vector of 20,000 fields, and you would have a one for some for a word, and the zero and every other position. A variant of these is called logit encoding. This is a form you will actually bump into later in the class.
This is very similar to the one hot encoding but the numbers are not zeros and ones, they are – they can be, for example, 0.8 and 0.5. If you only have hello world, so the larger value 0.8 is the word – word we’re trying to represent, and the lower values are words that could have been possible, but are not really the ones we’re looking for. So in this example, the word hello could be represented by the vector 0.8 0.5, because 0.8 is the one that has the maximum number in it.
This is a particular type of vector called the logit, that’s gone through a softmax function. A softmax makes it so that all the internal elements add up to 1, and so this makes it similar to probabilities because in probability, everything needs to add up to 1. So what this the softma – softmax example is telling you, is that there is a, for example, a 61% probability that the word you’re looking for is hello, and a 38 percent probability that the word you’re looking for is world.
So this would tell you that hello is probably the world – the word you’re looking for. This is one way that you can provide input to a neural network, by having some vector that represents words even by – either by switching them with a number or by dedicating a position of the vector to the word. And you can represent that position with a 1 a 0, or a probability. This would be encoding. You can also provide embeddings, and we’ve already studied these.
Word2vec is a kind of word – of embedding. You have a word, and then you represent the word by weighing two hundred features about its neighbors. If – is the word King a neighbor of the word kingdom, of the word man, of the word royalty, and so forth. Is the word woman a neighbor of the word kingdom, man, and so forth. So by representing these neighboring words in vectors, we get that word2vec vectors, and many neural networks can use these embeddings as inputs because they’re very rich or they’re a very rich way to represent each word.
We could also take spectrographic data, as we saw in one of our previous videos. If we have a sound wave, we can extract a lot of features from it. For example, the frequencies with the highest energy, the volume or intensity, the pitch of your voice, and you could use these energy values as input to your neural network to, for example, classify a sound. Let’s say you have the formants, the – the energy frequen – the frequencies and their energy for a certain sound, and the neural network will give you what sound it is as the output, for example, ah.
This is one way that you can provide data to a neural network. Another way for example we to use video data. If you have sign language, for example, maybe what you have is the pixels of a picture. So if you have pixels. you could represent each point by its numerical value in some color scale. For example you could reduce it to grayscale, and this would return you a matrix of values where higher values like 255 are associated to the color white and lower values like 0 are associated to the color black.
So as you can see, this matrix is a numerical representation of this grayscale picture, and if you have a color picture for example it would just be 3 matrices, one for the red one, for the RGB, one for the green one for the blue one. Yes, a color image would be the same, but in 3 – with 3 different matrices.
Another way that you could provide input as video would be by using a depth scanner, or a three dimensional scanner, for example, a Microsoft Kinect that comes with game consoles, these types of scanners can detect features on your face, for example so they could detect your eyebrows, your eyes, your mouth, and then they will give you very rich information about which parts of your body are moving, the depth at which they are moving, and we could use this kind of input to a neural network to, for example, translate from ASL into English.
This is one way that we could use language for the neural network. So, so far we have the inputs. In the next video, we’re going to talk about some possible outputs for a neural network.
Web enthusiast. Thinker. Evil coffeeaholic. Food specialist. Reader. Twitter fanatic. Music maven. AI and Machine Learning!