Human Language as a System of Rules – Hi and welcome to week 2 of accelerated computational linguistics. So this is gonna be one of the more theoretical weeks of our course. After week 2, we’re gonna work mostly on natural language processing and a little bit of machine learning, but this week we’re gonna study the foundations of the whole edifice.
We’re gonna study how to model human language through rules like the programming rules that we usually do. Can it be done? Is it possible? We’re gonna study a kind of series of abstractions called automata and we’re gonna study something called the Chomsky hierarchy which is a description of formal grammars and how they can be used to describe human languages and computational languages as well.
So in general we’re gonna try to figure out if it is possible to model all of the sequences of words and sounds in human language using rules in a computer and so in order to do that, let’s start with thinking about human language as a system of rules. So human languages are systematic. Indeed they are made up of rules and of reoccurring patterns, things that we see again and again. So if you’re in a class trying to learn a language, your teacher’s probably going to show you how the sentences of the languages – of the language are built.
So if you’re learning English, for example you will learn that first you have the subject, the person who does the action: Jane, and then you have the verb: Jane eats. Subject verb. Next the teacher will show you that there are sentences in English like Jane eats Pizza. The subject Jane, the verb, eats and then the direct object, pizza. A direct object is something that the action is done to. So what are we eating? The pizza. The pizza is the direct object.
So in English, our sentences should be subject verb, subject verb direct object. We cannot scramble them, for example: eats Jane is not a good sentence of English. It’s verb noun. Something like: Pizza Jane eats, is not a good sentence of English either. This would be the direct object the subject the verb. Other languages can say this but English really cannot. So all the – the pattern subject verb direct object is apparent and will reoccur time and again. And indeed we could try to describe – use that pattern and all of the similar patterns and turn them into rules so that ultimately you would have a massive description of how English works.
Let’s look at that – that sentence more closely. N here’s me- here means Noun. A noun is an object, a thing, a place a name, and V means verb. A verb is an action like to eat, to dream, to walk. So we have the sentence Jane eats. In this sentence, Jane is the subject.
Jane is a noun and the verb is to eat so maybe English sentences are Noun Verb. And this could help us generate many other English sentences. Indeed we could make a semi-formal description of an English sentence as one noun and one verb. Maybe we can expand on that description. We also have sentences like Jane eats pizza.
In Jane eats pizza, we have a noun as a subject, a verb, 1 verb and a noun for the direct object, pizza. It’s the thing that we’re eating. So look at some of the regularities that we have here. The sentence always begins with a noun and with – with one noun so maybe the first part of an English sentence is 1 noun. These sentences always have one verb so maybe we need one noun and one verb, and then the sentences vary on whether they can have a direct object. Sometimes they are zero nouns like in Jane eats and sometimes they are one noun as in Jane eats pizza.
So maybe the formalization of these sentences is one noun one verb zero or one nouns. Let’s try to see if we can have more sentences. We have Jane eats, Jane eats pizza. How about Jane eats ice cream? Here the subject remains one noun, the verb is one verb, and then the direct object can be two nouns: ice cream. So now we have that our English sentence can be one noun for the subject, one verb, and then zero, one, or two nouns. It could be three or more actually.
It could be something like Jane eats chocolate ice cream. So it could be one, zero, or three nouns. It could be Jane eats Vermont chocolate ice cream, zero or four nouns. So in general an English sentence maybe is one noun, one verb and zero or more nouns for the position of direct object. Let’s bring another sentence into the equation: Jane Smith eats pizza margherita. So look at how this changes our subject. Now we have two nouns Jane Smith. Maybe the subject is one or more nouns. It could be James Smith Watson for example, one two three nouns, but it always needs to be at least one for every sentence that we have seen.
There’s always at least one noun in the subject, there’s always one verb, eats, and there can be 0 1 2 or more nouns for the direct object. So we could formalize the description of this sentence – of these sentences as one or more nouns, one verb, zero or more nouns. This is a regular expression. This is this kind of structure that we studied last week. An English sentence would be one or more nouns, one verb, and zero or more nouns, so look at how a regular expression is describing the syntax of English sentences, and we could use this to generate hundreds thousands of new English sentences. Indeed we managed to turn one aspect of human language into a formal rule, at least one noun, one verb, zero or more nouns and these can describe things like Jane Smith eats pizza margherita.
So maybe with enough time, with enough effort, we could find one big regular expression to describe every sentence of English. We could find a regular expression to describe every sentence of Spanish, and so forth and in general we could then model English as a sequence of symbols, as a deterministic sequence of symbols that tells you, first I need a noun, then I need something like a verb, then I need something like a noun again.
Indeed we can describe languages rules, rules that order the way that sounds interact, the way that words appear in a sentence, so maybe all of human language could be modeled like this, like a sequence of symbols and the order in which they appear. This is the question that we’re going to be studying this week: is it possible to model human language like this? And tools that we’re going to be using to study this kind of ordering are a generalization of regular expressions, and a kind of abstraction called an automaton, plural, automata. Thank you.
Web enthusiast. Thinker. Evil coffeeaholic. Food specialist. Reader. Twitter fanatic. Music maven. AI and Machine Learning!