# Finite State Transducers

Finite State Transducers – Let’s talk about finite state transducers. They’re very similar to finite state machines except they take an input and then they produce an output for every transition between the states. We can use this to transform one string into a different string. So indeed transducers are very similar to finite state machines, they have states, so for example, this transducer here has the initial state 0 it has the end states 4 and 5, and it has the intermediate states 1 2 and 3.

This transducer has inputs, so it has the input symbol c in the transition from 0 to 1 the input symbol a in the transition from 1 to 2 the sym – the input symbol t in the transition from 2 to 3, and in the transition from 3 to 5 it has the input symbol s so if it was just a normal finite state machine it would read this – the string cats as it goes from 0 to 5.

A finite state transducer also has outputs. So for example in 0 to 1 in the transition between 0 and 1 you get the input c and then you produce the output c. It might look a little bit dumb but I promise you it’s worthwhile. This will transform the input cat into cat. Nothing new but it’ll do something really interesting for the plural: it will transform the literal word cats into a morphological description of the word.

It’ll transform cats into cat plural so the input is just the word in English and the output is a linguistic description of the word. It’s the root cat or the stem cat plus a plural. So again when you go to the initial state 0 from ze- and then make the transition from 0 to 1 the transducer takes the input c and it produces the output c so your output string has one character c in the transition from 1 to 2 you take the input a and produce the output a so your string so far has ca.

In the transition from 2 to 3 you take the input t and produce the output t. Now you have the string cat, and then what happens if you go from 3 to 4 you take an epsilon, so you take nothing and you produce nothing and then you exit through end state four.

This would be the first row where you take cat as the input and just produce the root cat as the output. How about the other word? You go from the initial state 0 in the transitions from 0 to 1, 0 1 to 2, 2 to 3 and you get cat and then when you transition from state 3 to state 5 you get the plural s you get the character s as the input and then the output is gonna be just a dash a p and an l to signify the plural. So now the output string has cat-pl which is a morphological description of the word, it’s cat plus the plural morpheme.

So what this transducer just did was transform a word in English into a description of a word in English. We could do this, so we can take any word in English and then produce a description for – the linguistic description. For example we could expand the finite state transducer that we have here so that it can take not only the input cat and cats, but also dog and dogs, city and cities. Now my suggestion for this would be to take a piece of paper and draw it first. Obviously we’re gonna get into the programming of it but before we program anything as with any software design you need to think of what you’re going to do.

My suggestion would be to grab a piece of paper, a pen or a pencil and draw the finite state transducer that will make all the transformations from this input into this output. Pause the video now. And welcome back. So this finite state transducer would look something like this, as you go from 0 to 1, 1 to 2, 2 to 3, 3 to 4, we have cat. 0 to 1, to 2 to 3 to 5, we have cats. These are the ones we have from before. We can add another sequence from 0 to 1 for the c where you take an input c and then you produce an output c you can go from 1 to 6 so the transition from 1 to 6 takes the input i and then produces the output i for the output string 6 to 7 takes a t, 7 to 8 takes a y so the path 0 1 6 7 8 produces the word city.

Let’s go back to number 7. By the time we’re in 7 we have an output string cit. From 7 to 9 you take as an input i and then produce an epsilon output. You give nothing for the output from 9 to 10, you take the e as the input and do nothing for the output. So you have you still have cit, cit cit as the output string and you’re just waiting to see if you’re actually gonna get the word cities by the end. In the transition from 10 to 11 you get the input s and by the time you have that input you know oh I actually did see the word cities now your output is going to be what you had cit plus a y-pl so your output is city-pl.

Remember the input was cities the output is city plural. The path for dog and dogs is easier. From zero to 12 you get the input d and the output d from 12 to 13 o, o from 13 to 14 g, g, from 14 to 15 you get epsilon to epsilon so the output is dog, just a root dog, and then from 14 to 16 you get the s so that you have the input dogs and you transform this into the output dog-pl dog plural. This would be a finite state transducer that can describe the morphology of three English words and their corresponding plurals.

Let’s talk for a minute about weighted finite state transducers. So let’s say you have the finite state transducer that we see here just from 0 to 1 and 1 to 2 you have a ca as the input and ca as the output, and then you can produce two strings cat at the end state 3 and cap at the end state 4. This weighted finite state – I’m sorry this finite state transducer will accept the strings cat and cap but how would you handle a corpus that has – has 3 cat and one cap, so it’s evident that – there’s two words in there however they’re not happening in the proportions.

You have three times as many cat as you have cap. This is what a finite – I’m sorry a weighted finite state transducer helps you do. It helps you take those probabilities or those weights into account. So for example in this corpus of four words, the transition from zero to one is always gonna be input c output c with 100% of the cases doing that. From one to two, a to a, a hundred percent of the cases are gonna do that, and then from two to three 75% of the cases are gonna go into the t and twenty five percent of the cases are gonna go into the p.

This weighted finite state transducer correctly describes a corpus with four words where three of them are cat and one of them is cap. We’re gonna see this kind of structure many times because it’s used in many applications including speech recognition. So for example here the transition from 0 to 2 to 3 to 4 would recognize the sounds in the word data so the transition from 0 to 1 you get the input like the the sound input d and produce the output string data, from 1 to 2 you hear the a, and produce an epsilon output because you don’t need anything more in your string from 2 to 3 you hear the t I’m given epsilon output because you already have the word data, from 8 to 4 you hear the sound ah and then get an epsilon in the output, so the string that you got as the input was data in the sounds and the one that you got at the as the output is data as four characters, and then you go out through end state 4.

Notice that you also do that from 0 to 5 to 6 you – in from 0 to 5 you get the input d as the sound and then you produce the output dew, waiting hoping that the next sound is u, when you get that get epsilon output because you already have the string, and exit successfully having recognized the word dew. So we are gonna see these structures pop up every now and then in the class.

In summary finite state transducers are a kind of out automaton. They are related to finite state machines, they have states, they have transitions, but the difference is that they take a symbol as an input and then they produce an output as they go in their transitions from state to state. A weighted finite state transducers – transducer includes a weight for each transition. This weight is going to tell you how frequent those transitions are and these wei – weighted finite state reducers are very common in many applications including speech recognition.