Tikalon Blog is now in archive mode.
An easily printed and saved version of this article, and a link
to a directory of all articles, can be found below: |
This article |
Directory of all articles |
Natural Language Interface
July 29, 2013
It's a scene played out in many a
comic book,
novel,
television show, and
movie. The
protagonist, say
Captain Kirk, is talking to his
computer and expecting answers.
Kirk: "Computer, how many furlongs in a light year."
Computer: "Working... There are 46,996,813,387,000 furlongs
in a light year."
Kirk: "Well, Spock, I don't think he traveled by horse."
Spock: "Agreed, Captain."
There's the memorable scene in the movie,
Star Trek IV: The Voyage Home, in which
Scotty attempts to interact with a computer by talking into a mouse. Also, by 2001, we should have had talking computers like the
HAL 9000.
Everyone seems to think that computers should accept verbal questions and give good answers; that is, computers should have a
natural language user interface (NL). Some applications are now coming close to that ideal.
Siri would be one example.
One reason why we're getting close to the
Star Trek ideal is the phenomenal computing power that's now available at such a low cost in many
consumer devices. Another reason might be the decision by NL
scientists to divorce themselves from the
artificial intelligence (AI) field. Many
computer scientists decided to stop identifying their work as artificial intelligence when there was a
backlash against AI's being over-sold to funding agencies in the last decades of the
twentieth century.
NL was long in coming, since the problem goes far beyond
parsing speech into a text file for further processing. The question, "List all
bloggers on web sites with
physics degrees," might be a problem to an NL system, since there are probably no web sites with physics degrees.
Computer scientists at
MIT's Computer Science and Artificial Intelligence Laboratory have tackled a natural language interface for the specific task of forming
regular expressions. Regular expressions are a
scripting language contained in many
programming languages and
word processors to aid text search and replacement. This
research was presented in June at the annual
conference of the
North American Chapter of the Association for Computational Linguistics.[1-2]
I used a regular expression in
OpenOffice to remove extra
linefeeds from the
manuscript of one of my novels. Unless you use these expressions regularly (pun intended), it takes a while to craft the exact expression that solves your problem. As can be seen in the example in the following figure, you essentially need to be a computer scientist to craft even a simple regular expression.
Even simple regular expressions are not that simple.
(MIT Graphic by Christine Daniloff.)[1)]
It's interesting to note that even computer scientist have a hard time crafting regular expressions. When
Nate Kushman, an MIT
graduate student, presented the paper on a natural language interface of regular expressions to a roomful of computer scientists, he asked them to write a regular expression for a simple search. After displaying the proper expression, he
polled the
audience to see how many had written the right expression. Just a few had.[1]
A natural language interface for regular expressions would allow both
absent-minded computer scientists and non-programmers alike to do efficient search and replacement tasks in their word processors and
spreadsheets. Kushman and
Regina Barzilay, an
associate professor of
computer science and electrical engineering, used examples that they harvested from the
Internet to train a computer system to generate regular expressions from natural language queries.[1]
Something to make humans chortle and natural language interfaces choke. The first verse of "Jabberwocky" in English and Swedish. Jabberwocky is a nonsense poem written by mathematician, Lewis Carroll. It's contained in his 1871 novel, Through the Looking-Glass, and What Alice Found There. (Swedish translation (Tjatterskott) by Harry Lundin.)
There are some examples where a forced
syntax for a language query yields excellent results. One of these is
structured query language (SQL) which is used for many
database applications. Unfortunately, there isn't a good
mapping between natural language and regular expressions. As an example, a regular expression for a search for three letter words starting with an "X," as shown in the figure, doesn't have any part that means "three letters."[1]
Kushman and Barzilay found that it's possible to write regular expressions that map to natural language and are equivalent to the usual expressions. These expressions are not very succinct, nor are they intuitive. However, when they're found, they can be identified with their succinct counterpart using a
graph.[1] An example of the regular expression for finding three letter words starting with an 'X' is shown below.[2] Compare with the regular expression in the first figure.
([A-Za-z]{3})&(\b[A-Za-z]+\b)&(X.*) |
three letter | [A-Za-z]{3} |
word | \b[A-Za-z]+\b |
starting with ’X’ | X.* |
In other work on natural language processing at MIT, Barzilay,
Tao Lei,
Fan Long and
Martin Rinard have developed a program, called an
input parser, to sort the
data from other information in a computer file.[3] A text file, for example, might have information about text formatting along with the actual text.[1]
Their parser interprets the natural language
specification of the file format, something that a
programmer needs to do when creating a program to read and write such files. The MIT team had a good development resource, about 180 file format examples used in the
Association for Computing Machinery's International Collegiate Programming Contest. The MIT natural language interface succeeded in about 80 percent of the specifications. In the failed cases, changing just a word or two of the specification usually gave a working input parser.[1]
The natural language interface was
efficient, taking about ten minutes of calculation on an ordinary
laptop to produce the parsers for all these specifications.[1]
Luke Zettlemoyer, an assistant professor of
computer science and engineering at the
University of Washington, who was not a part of the natural language interface team, said that "the techniques they have developed should definitely generalize to other related programming tasks."[1]
References:
- Larry Hardesty, "Writing programs using ordinary language," MIT Press Release, July 11, 2013.
- Nate Kushman and Regina Barzilay, "Using Semantic Unification to Generate Regular Expressions from Natural Language," Preprint Available at MIT Web Site (PDF File).
- Tao Lei, Fan Long, Regina Barzilay and Martin Rinard, "From Natural Language Specifications to Program Input Parsers," Preprint Available at MIT Web Site (PDF File).
Permanent Link to this article
Linked Keywords: Comic book; novel; television program; television show; film; movie; protagonist; Captain Kirk; computer; furlong; light year; Spock; horse; Star Trek IV: The Voyage Home; Montgomery Scott; Scotty; HAL 9000; natural language user interface; Siri; Star Trek; consumer electronics; consumer device; scientist; artificial intelligence; computer scientist; AI winter; backlash; twentieth century; parsing; blog; blogger; physics; Massachusetts Institute of Technology; MIT; Computer Science and Artificial Intelligence Laboratory; regular expression; scripting language; programming language; word processor; research; conference; North American Chapter of the Association for Computational Linguistics; OpenOffice; newline; linefeed; manuscript; Christine Daniloff; Nate Kushman; postgraduate education; graduate student; voting; poll; audience; absent-mindedness; absent-minded; spreadsheet; Regina Barzilay; associate professor; computer science and electrical engineering; Internet; Jabberwocky; mathematician; Lewis Carroll; Through the Looking-Glass, and What Alice Found There; Tjatterskott; Harry Lundin; syntax; SQL; structured query language; database; mapping; graph; Tao Lei; Fan Long; Martin Rinard; parsing; input parser; data; specification; programmer; Association for Computing Machinery; International Collegiate Programming Contest; algorithmic efficiency; efficient; laptop; Luke Zettlemoyer; computer science and engineering; University of Washington.