# Outputs of Neural Networks Outputs of Neural Networks -In the last articles, we looked at some of the inputs that we can provide to a neural network. For example, we can use encodings. This is an example of one hot encoding, where you would represent each word with a field in a vector.

If you have three words, then you have a vector with three fields, you could use embeddings where you have, for example a 200 length vector and the fields are the properties of the words that surround the original word, so if it’s a word like king, the weights for possible neighbors of that word, you could have then encodings, embeddings, you could have audio data, for example or data extracted from spectrogram, from a recording of human voice, you could have video data, for sign language, for example where you can detect the depth and motion of hands but also of parts of people’s faces.

And now let’s take a moment to consider what kind of outputs we can get from a neural network. Most algorithms are going to give you a vector as an output, or a sequence of vectors as an output. For example, they could be like the logit passed through a softmax, or if you have a vocabulary of 10 000 words, each of them will have a probability, and then there will be one word with a very high probability, the word buy for example, and then you know that the word – that the network is giving you the word buy.

This could also happen with visual networks, where you can have, for example, 10 numbers, handwritten numbers, and there’s an output vector of 10, and if the highest value is in the zeroth position, it’s the number zero. If the highest value is in the fifth position, then it’s the number five, and so forth. So neural networks are always going to give you some form of vector representing what could be a word, for example, or a sequence of vectors for words in a sentence.

But doing that gives you enormous flexibility to get all sorts of really interesting tasks. For example, we could do neural text generation, where we provide some words to the program, like The spaceship entered orbit around the planet, and then the neural network is going to try to guess the next word and the next word and the next word, and this is an example of uh what the site talked to the transformer did. Once in orbit, the ship jumped to hyperspace, leaving the planet.

It then began traveling in an elliptical pattern moving up towards the spherica. That’s very good English! That uh this generates very good English and i urge you to go to the website and give it a try. This model is called the GPT-2. It’s a kind of transformer. We’ll look at them in the next articless. It has a training set of
40 gigabytes, of 8 million web pages, and the whole neural network has about 1.5 billion parameters, so weights and neurons and things you can set.

So it’s not small or easy to run as you can see. But it generates very good English. And again what this does is that you get a word like spaceship, and then the network will give you the output entered, or word like elliptical, and the network will give you the output pattern, so it generates the next word and in
this, in doing this, it’s a language model.

The network can perform these kinds of networks, and perform many tasks, for example, filling out missing words. If you have a sentence like I want to ___ the car because it is cheap, and you provide that to a type of structure called a BERT, the BERT will correctly predict that what was masked was the word buy.

And as a matter of fact, you will find this code in your canvas, the code to get a BERT to guess a missing word in the input. This model is – has 24 layers. The layers have 1024 uh it’s hidden, I think it’s 1024 neurons in the hidden layers, 16 of something called attention heads, and it has a it has 340 million parameters. So it’s very large as well. So what this does is it looks at a sentence, and then finds the gap, and tries to see what could have gone there in the middle.

Let’s look at this example. A network called a sequence to sequence encodes the sentence and passes something to another neural network that gives you I am a student. So this type of neural network gets je and output some vector, then takes these two things je and the vector, and sends it to – it to the one vector, and the decoder starts sending you I am a student, one at a time and see – each of them is a sequence of works, and the encoder passes a vector to the decoder.

In doing this a neural network can do translation of one language from the other as you can see here. Neural networks can take images or sound and classify it. So for example they can see a part of a spectrogram and tell you that the sound e was present there. And if it does this with more and more sounds, it’ll slowly give you a dictated form of the words in the sound wave. For example, it could give you – as you can see the images, you get an image, you transform it into a matrix and then you get, for example.

the transcription into from ASL sign – sign spelling to English. One very cool thing that neural networks can do, that deep learning can do, is that it can take multi-modal inputs and outputs. So it can convert an image into a natural language description of the image, a herd of zebras, or it could take a natural language description, and then try to generate a picture that matches that description.

Neural networks can do things like provide um answer windows. So this is something that BERTs can do, for example, there’s a training set called the Stanford squad for questions and answers. And so what the data set has is questions like, what causes precipitation to fall, it has contexts where you can get the
answer, in meteorology, precipitation is a product of condensation, of I’m sorry, water vapor, that falls under gravity. And then the answer gravity.

And what this network learns to do is to find two numbers, the start of the string that can the index of the string for the start of the answer, and the index for the end of the string, of the end of the answer in the string. So let me say that again. What it learns are two numbers, the part of the string where the answer begins, and the part of the string with the answer ends, gravity.

For example, BERTS can be adapted to many things, but this is one way that they can do it. So as you can see, this is just a sampling of what they can do. These structures can generate, are very flexible. It can generate all sorts of outputs. Neural networks, in general, deep learning models take vectors or sequences of vectors as inputs, and they produce all sorts, they produce vectors or sequences of vectors as outputs.

But we can use these as the next word in a sentence, the foreign language version of a word in our language, the point in a paragraph which – where you have the answer to the question, the spell – the the written version of some sound, and there’s for example the written description of a picture.

So there’s many ways that these structures can help us because they have some internal knowledge about language. They are language models, but as we’ll see in the final video of the week, these are very opaque language models, and it’s really difficult to understand what is it exactly that the computer knows about language.

In the next few articles, we will look at deep learning and how we can apply deep learning in natural language processes.