History of Computational Linguistics – Welcome back. We’re going to talk about the history of computational linguistics a field of research that has been active for at least 80 years. So fun fact: the term artificial intelligence was coined here at Dartmouth College! There was a meeting in 1956 where a group of researchers were trying to figure out what would it mean for a computer to think, and what – how could you get a computer to demonstrate intelligence.
And they believed that every aspect of learning, or any other feature of intelligence, can in principle be so precisely described that a machine can be made to simulate it. Notice the optimism in this phrase. This kind of optimism in the belief that computers could model any problem and could do so fairly easily was a hallmark of this first stage in research in artificial intelligence in the 1950s and 60s.
So here we have a timeline of research in computational linguistics and natural language processing, and the timeline has two components: on the upper part you will see approaches that believe that statistical extrapolation, so trying to find statistical patterns or frequency patterns in a text, are the way to learn a language. This is the way that a neural network would work for example or any kind of statistical machine learning would work, trying to find which words go with which and how often they do that.
On the bottom of the timeline, we have approaches that believe that you learn languages through rules. That formal grammars, rules, and in general the manipulation of symbols, is how you should learn a language and how it – you should describe one as well. So let’s start with the top of the chart. In the early 20th century, people conducted research on neurons, both on human neurons and on computer then structures like computer switches which we turn on and off, and a – an idea called connectionism emerged which proposed that learning is the connections between neurons biological neurons or in between computer switches.
So these kinds of connections between things was where learning would happen. And indeed this inspired the first neural network such as the perceptrons in the late 1950s. Throughout the late 50s and 60s, there was research on neural networks, but unfortunately these neural networks were fairly limited. They did not have hidden layers, they only had very few input neurons, and so they could not succeed in the way that they – their creators had foreseen.
This led to pessimism about artificial intelligence and to a period called the first artificial intelligence winter in the late sixties and seventies. Despite these problems, progress was being made on accumulating data. For example, the first million word corpus for English was created in the early 60s, and people working on applications like optical character recognition and authorship attribution, which is trying to figure out who wrote a document based on how often they use certain words. At the bottom of the chart, regarding formal grammars and rules in the early nine – 1954 for example these were the first attempts at conducting machine translation and these programs used rules that would find a word in Russian, for example, and change it for a word in English.
They would do so having a file that would essentially be a dictionary telling you that this word in Russian corresponds to this word in English. It’s – they were successful with small examples. One of the early examples was translating a couple dozen sentences about coal in scientific papers from Russian to English and it worked and so there was optimism about what these – about what these rule systems could do.
This was happening at the same time as research – research was being conducted on how to formalize natural languages. So Noam Chomsky during the 50s and early 60s invented theories of grammar that were based on rules. On rules and transformations. And so natural language knowledge could be systematizing these rules which people thought could be put into a computer. As a matter of fact, in the 1960s this when the program Eliza was created, which is a simple psychologist chatbot. You put in a greeting or a question, and then the computer using pattern matching regular expressions, switches the question and returns it to you, so it appears to be an interaction and it does so again because it has rules modeling language.
So again the 1970s was the first artificial intelligence, the late 1960s, I apologize, and the 1970s were the first artificial intelligence winter and during this era people were developing the rule based approaches to parse human language and to try to see if this parsing could be put into computer programs. As usually happens in human endeavors, people disagreed on how this could be done and many theories of syntax splintered from there. If you’re interested, this was actually called the linguistics wars.
There were so many theories that you couldn’t go to a computer scientist and tell them please implement the theory of syntax, because there were so many of them. This led to increased interest in statistical approaches in numer – statistical extrapolation. People were doing things like statistical translation which was trying to find correspondences in words between two languages, so how often would the word you are be translated to estas in Spanish, and these efforts were fairly successful. So successful that there’s someone called Fred Jellinek… who no one is sure exactly of what he said, but the most usual citation for it is that “Every time we fire a linguist, the performance of our system goes up.”
People had a lot of faith that simply by using numerical systems, they could find enough talents for the computer to model human language. Unfortunately, hardware in the 1980s was not up to this task.
The volume of data you would need to analyze would be too much for the computers at the time, and artificial intelligence went into a second artificial intelligence winter in the early 80s, and in the late 80s and early 90s. On the other side of the fence people were still trying to work with formalizations of language. People were building automata, for example finite state machines and finite state transd – transducers to model aspects of language, but it ultimately turned out, as we’ll see next week, that these models are – would be incredibly large and almost unmanageable.
People were also building a kind of system called an expert system, where they believed, for example, that you could extract the knowledge of a thousand doctors and then put, through questions and answers, and then put all those questions and answers and knowledge into a kind of decision tree and then model that so that another doctor could query the decision tree and you could get all those decisions. But it turns out that extracting that knowledge from humans is extremely work intensive, extremely expensive, and research has not kept up in – in that kind of paradigm as it has with machine learning that is statistically based.
in the 1990s, two things happened. First hardware improved enough that you could now reliably run neural networks and other learning algorithms on your home computer. There’s that but also enough corpora were accumulated so that you could find patterns in large collections of texts. This has led to the era where we are right now, in between the 2000s and 2010’s, people have invented new architectures for neural networks: architectures with more hidden layers, architectures with more input neurons, and that have more interrelationships between these neurons.
We call these kinds of networks deep learning in general. So these new neural network architectures combined with more computing power have allowed for a furthering of methods wh – where you’re trying to find numerical correlations in between data. That has changed. also people are now used to to using human languages with computers in the 20th century. Most of the communicate – of the communication with computers was done through keyboards and through programming commands. Nowadays language mediated interactions are very common.
All of you have cell phones where you can say hey Google or hey Siri and the cell phone will understand what you’re saying. In the near future, the field of computational linguistics and natural language processing is trying to understand how to better understand sentences not only to model human languages, like what words come after the other, but try to extract knowledge from that, try to extract embodied knowledge so that the computer is also aware not only of the linguistic context of what would it mean in the real world, and will this have better software applications.
On the formalization side, to be honest most research nowadays is in machine in – numerical based machine learning. However, progress has been made in trying to get simpler formal grammars and we’ll see if these two fields can merge in the future.
So in summary, there have been several artificial intelligence winters when research turned away from connectionist or numerical methods and more towards rule-based approaches but the moment we’re in right now is the boom of deep learning methods and people have applied many of them to natural language processing, as we will see in the following weeks.
Web enthusiast. Thinker. Evil coffeeaholic. Food specialist. Reader. Twitter fanatic. Music maven. AI and Machine Learning!