Tikalon Header Blog Logo

Natural Language Interface

July 29, 2013

It's a scene played out in many a comic book, novel, television show, and movie. The protagonist, say Captain Kirk, is talking to his computer and expecting answers.
Kirk: "Computer, how many furlongs in a light year."

Computer: "Working... There are 46,996,813,387,000 furlongs
                    in a light year."

Kirk: "Well, Spock, I don't think he traveled by horse."

Spock: "Agreed, Captain."
There's the memorable scene in the movie, Star Trek IV: The Voyage Home, in which Scotty attempts to interact with a computer by talking into a mouse. Also, by 2001, we should have had talking computers like the HAL 9000.

Everyone seems to think that computers should accept verbal questions and give good answers; that is, computers should have a natural language user interface (NL). Some applications are now coming close to that ideal. Siri would be one example.

One reason why we're getting close to the Star Trek ideal is the phenomenal computing power that's now available at such a low cost in many consumer devices. Another reason might be the decision by NL scientists to divorce themselves from the artificial intelligence (AI) field. Many computer scientists decided to stop identifying their work as artificial intelligence when there was a backlash against AI's being over-sold to funding agencies in the last decades of the twentieth century.

NL was long in coming, since the problem goes far beyond parsing speech into a text file for further processing. The question, "List all bloggers on web sites with physics degrees," might be a problem to an NL system, since there are probably no web sites with physics degrees.

Computer scientists at MIT's Computer Science and Artificial Intelligence Laboratory have tackled a natural language interface for the specific task of forming regular expressions. Regular expressions are a scripting language contained in many programming languages and word processors to aid text search and replacement. This research was presented in June at the annual conference of the North American Chapter of the Association for Computational Linguistics.[1-2]

I used a regular expression in OpenOffice to remove extra linefeeds from the manuscript of one of my novels. Unless you use these expressions regularly (pun intended), it takes a while to craft the exact expression that solves your problem. As can be seen in the example in the following figure, you essentially need to be a computer scientist to craft even a simple regular expression.

An example of a regular expression

Even simple regular expressions are not that simple.

(MIT Graphic by Christine Daniloff.)[1)]

It's interesting to note that even computer scientist have a hard time crafting regular expressions. When Nate Kushman, an MIT graduate student, presented the paper on a natural language interface of regular expressions to a roomful of computer scientists, he asked them to write a regular expression for a simple search. After displaying the proper expression, he polled the audience to see how many had written the right expression. Just a few had.[1]

A natural language interface for regular expressions would allow both absent-minded computer scientists and non-programmers alike to do efficient search and replacement tasks in their word processors and spreadsheets. Kushman and Regina Barzilay, an associate professor of computer science and electrical engineering, used examples that they harvested from the Internet to train a computer system to generate regular expressions from natural language queries.[1]

First verse of Jabberwocky in English and Swedish

Something to make humans chortle and natural language interfaces choke. The first verse of "Jabberwocky" in English and Swedish. Jabberwocky is a nonsense poem written by mathematician, Lewis Carroll. It's contained in his 1871 novel, Through the Looking-Glass, and What Alice Found There. (Swedish translation (Tjatterskott) by Harry Lundin.)

There are some examples where a forced syntax for a language query yields excellent results. One of these is structured query language (SQL) which is used for many database applications. Unfortunately, there isn't a good mapping between natural language and regular expressions. As an example, a regular expression for a search for three letter words starting with an "X," as shown in the figure, doesn't have any part that means "three letters."[1]

Kushman and Barzilay found that it's possible to write regular expressions that map to natural language and are equivalent to the usual expressions. These expressions are not very succinct, nor are they intuitive. However, when they're found, they can be identified with their succinct counterpart using a graph.[1] An example of the regular expression for finding three letter words starting with an 'X' is shown below.[2] Compare with the regular expression in the first figure.
three letter[A-Za-z]{3}
starting with ’X’X.*
In other work on natural language processing at MIT, Barzilay, Tao Lei, Fan Long and Martin Rinard have developed a program, called an input parser, to sort the data from other information in a computer file.[3] A text file, for example, might have information about text formatting along with the actual text.[1]

Their parser interprets the natural language specification of the file format, something that a programmer needs to do when creating a program to read and write such files. The MIT team had a good development resource, about 180 file format examples used in the Association for Computing Machinery's International Collegiate Programming Contest. The MIT natural language interface succeeded in about 80 percent of the specifications. In the failed cases, changing just a word or two of the specification usually gave a working input parser.[1]

The natural language interface was efficient, taking about ten minutes of calculation on an ordinary laptop to produce the parsers for all these specifications.[1] Luke Zettlemoyer, an assistant professor of computer science and engineering at the University of Washington, who was not a part of the natural language interface team, said that "the techniques they have developed should definitely generalize to other related programming tasks."[1]


  1. Larry Hardesty, "Writing programs using ordinary language," MIT Press Release, July 11, 2013.
  2. Nate Kushman and Regina Barzilay, "Using Semantic Unification to Generate Regular Expressions from Natural Language," Preprint Available at MIT Web Site (PDF File).
  3. Tao Lei, Fan Long, Regina Barzilay and Martin Rinard, "From Natural Language Specifications to Program Input Parsers," Preprint Available at MIT Web Site (PDF File).

Permanent Link to this article

Linked Keywords: Comic book; novel; television program; television show; film; movie; protagonist; Captain Kirk; computer; furlong; light year; Spock; horse; Star Trek IV: The Voyage Home; Montgomery Scott; Scotty; HAL 9000; natural language user interface; Siri; Star Trek; consumer electronics; consumer device; scientist; artificial intelligence; computer scientist; AI winter; backlash; twentieth century; parsing; blog; blogger; physics; Massachusetts Institute of Technology; MIT; Computer Science and Artificial Intelligence Laboratory; regular expression; scripting language; programming language; word processor; research; conference; North American Chapter of the Association for Computational Linguistics; OpenOffice; newline; linefeed; manuscript; Christine Daniloff; Nate Kushman; postgraduate education; graduate student; voting; poll; audience; absent-mindedness; absent-minded; spreadsheet; Regina Barzilay; associate professor; computer science and electrical engineering; Internet; Jabberwocky; mathematician; Lewis Carroll; Through the Looking-Glass, and What Alice Found There; Tjatterskott; Harry Lundin; syntax; SQL; structured query language; database; mapping; graph; Tao Lei; Fan Long; Martin Rinard; parsing; input parser; data; specification; programmer; Association for Computing Machinery; International Collegiate Programming Contest; algorithmic efficiency; efficient; laptop; Luke Zettlemoyer; computer science and engineering; University of Washington.