Scientists Give Computers New Tools To Understand Speech

June 18, 1997

In the futuristic world of "Star Trek," computers listen to human speech, then follow orders or answer questions with near-perfect precision. But present-day computers are not so skillful. Current machines can respond to brief, clearly uttered instructions such as "open the door," but asking them to understand casual conversation is asking for trouble. Even state-of-the-art systems stumble over words that sound alike or have more than one meaning. To a computer, the phrases "recognize speech" and "wreck a nice beach" sound the same. When people slur their words or speak with a regional accent, computerized comprehension becomes even tougher.

In the face of such obstacles, researchers at The Johns Hopkins University are developing new tools to help computers understand speech. To support this work, they recently received a $750,000 National Science Foundation grant. The Hopkins team was one of 15 nationwide picked to participate in a $10-million NSF project aimed at developing more natural interaction between computers and humans. When the Hopkins software is perfected, it could allow a blind person to dictate a letter to a computer with far greater accuracy than existing systems. It could also provide a powerful new way to search through hours of recorded speeches and news reports that have not been transcribed. "The grand goal," says Eric Brill, a researcher at Hopkins' Center for Language and Speech Processing "would be to have a computer understand any kind of human speech."

Significant advances are needed. Current speech recognition software often trips over garbled or sound-alike words. When that happens, it looks at the previous word or two for help, says Brill, an assistant professor in the Department of Computer Science. With this method, the machine can decide that the words it heard after "walking" were more likely to be "the dog" than "the dock." Still, this existing software will incorrectly transcribe about 40 percent of the words it hears in a casual conversation. "It's not enough," Brill says. "We need to be much more sophisticated in predicting what was just said, based on what the computer has already heard. It will be a very long time before we're down to a 1 percent error rate. But the system becomes more and more useful as the error rate goes down." To cut down on mistakes, Hopkins researchers are teaching computers to examine the structure of a sentence, not just a couple of neighboring words. Just as grammar school children are taught to do, a computer could be programmed to break a sentence into its subject, verb, object and modifiers. With this knowledge, it could make better guesses about troublesome words. "If the main verb of the sentence is 'drive,' then 'spaghetti' is an unlikely object," explains Brill. "But 'car' is a likely object, or maybe 'golf ball.' We are trying to get the computer to ask questions like that."

To further aid comprehension, the Hopkins team wants the computer to figure out the subject of a conversation, such as science, music, politics or food. If words such as "home run," "catcher" and "foul ball" turn up, the computer should sense that "sports" is the topic and that the baffling word is probably "pitcher," not "picture." "It's very important in speech recognition to know what's under discussion," says David Yarowsky, another CLSP researcher who is also an assistant professor of computer science. "So the computer will have these 'topic detectors' running in the background. Are we talking about education or politics? Are we talking about a school principal or a legal principle? Both words sound alike, but the computer will give them different weights depending on what topic it thinks you're talking about."

Together, the added attention to linguistics and world knowledge should help computers recognize human speech with far greater accuracy, the Hopkins researchers say. Within five years, Yarowsky predicts, a computer may serve as an audio search engine. It could "listen" to hours of radio and television news reports and locate virtually every speech or interview in which Secretary of State Madeleine Albright has discussed human rights issues involving China, for example. "For this purpose, you don't have to have a perfect speech recognizer," Yarowsky says. "You don't have to get every word right to recognize that Mrs. Albright is speaking about human rights in China. This technology has the potential to revolutionize the way we retrieve things that were never in text form to begin with, but were only recorded as speech."

The sort of speech recognition seen on "Star Trek" may be many years away, but the Hopkins researchers are moving in that direction. "Human-computer interaction is the major goal here," says Yarowsky. "We want to make it easier for people to interact with machines."

Johns Hopkins University

Related Speech Articles from Brightsurf:

How speech propels pathogens
Speech and singing spread saliva droplets, a phenomenon that has attracted much attention in the current context of the Covid-19 pandemic.

How everyday speech could transmit viral droplets
High-speed imaging of an individual producing common speech sounds shows that the sudden burst of airflow produced from the articulation of consonants like /p/ or /b/ carry salivary and mucus droplets for at least a meter in front of a speaker.

Speech processing hierarchy in the dog brain
Dog brains, just as human brains, process speech hierarchically: intonations at lower, word meanings at higher stages, according to a new study by Hungarian researchers.

Computational model decodes speech by predicting it
UNIGE scientists developed a neuro-computer model which helps explain how the brain identifies syllables in natural speech.

Variability in natural speech is challenging for the dyslexic brain
A new study brings neural-level evidence that the continuous variation in natural speech makes the discrimination of phonemes challenging for adults suffering from developmental reading-deficit dyslexia.

How the brain controls our speech
Speaking requires both sides of the brain. Each hemisphere takes over a part of the complex task of forming sounds, modulating the voice and monitoring what has been said.

How important is speech in transmitting coronavirus?
Normal speech by individuals who are asymptomatic but infected with coronavirus may produce enough aerosolized particles to transmit the infection, according to aerosol scientists at UC Davis.

Using a cappella to explain speech and music specialization
Speech and music are two fundamentally human activities that are decoded in different brain hemispheres.

Speech could be older than we thought
The theory of the 'descended larynx' has stated that before speech can emerge, the larynx must be in a low position to produce differentiated vowels.

How the brain detects the rhythms of speech
Neuroscientists at UC San Francisco have discovered how the listening brain scans speech to break it down into syllables.

Read More: Speech News and Speech Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to