Nav: Home

A new corpus of 'slips of the ear' in English

February 17, 2017

Listening in quiet conditions is actually quite rare. Most of the time there is some kind of noise present, whether it is traffic, machinery, or simply other conversations. As native speakers with a rich experience of the language and the context in which speech occurs, we have a great capacity to reconstruct the part of the message obscured by noise. However, errors still occur at times. A group involving Dr García Lecumberri, Ikerbasque Research Professor Martin Cooke, along with researchers Dr Jon Barker and Dr Ricard Marxer of the University of Sheffield (UK) have identified 3207 "consistent" confusions. The confusions are said to be consistent because, in every case, a significant number of listeners agree. This type of confusion is extremely valuable in the construction of models of speech perception, since any model capable of making the same error is very likely to be undergoing the same processes as those in human listeners.

The research study involved more than 300000 individual stimulus presentations to 212 listeners in a range of different noise conditions. The resulting corpus is the only one of its kind for the English language and is available at For each confusion, the corpus contains the waveforms of both the speech and the noise, a record of what a cohort of listeners heard, along with phonemic transcriptions. Distinct types of confusion appear with some frequency in the corpus. In the simplest cases what is clear is that the noise masks some parts of the word, forcing listeners to suggest a word that best fits the audible fragments (e.g., "wooden" -> "wood"; "pánico" -> "pan") or to substitute one sound for another ("ten" -> "pen"; "valla ->falla"). In other cases listeners appear to incorporate elements from the noise itself ("purse" -> "permitted"; "ciervo" -> "invierno"). Finally, the researchers find odd cases where there is little or no relation between the word produced and the confusion ("modern" -> "suggest"; "guardan -> pozo"). In these cases the way that the speech and noise signals interact is complex, and therefore interesting.

Dr García Lecumberri argues that "these studies help to reveal the mechanisms underlying speech perception, and the better we understand these processes, the more we can help at a technical and clinical level those listeners who suffer hearing and speech comprehension problems". The group has also elicited a similar corpus for the Spanish language that can be accessed from the same web page. "There are similarities and differences between Spanish and English confusions: Spanish is a highly-inflected language, leading to more confusions in word-final position; English has a larger number of monosyllabic words and a richer set of word-final consonants, leading to more substitution-type errors in this position" she adds. However, both languages show a similar pattern of confusion types in noise, with some sounds surviving better than others.
Additional information

Dr María Luisa García Lecumberri is Senior Lecturer in English Phonetics in the Faculty of Letters at the University of the Basque Country (Vitoria) and member of the Language and Speech research group, to which Ikerbasque Research Professor Dr Martin Cooke also belongs. Dr Jon Barker is Reader in Computer Science in the Speech and Hearing research group at the University of Sheffield, where Dr Ricard Marxer works as a research fellow. Corpus collection was funded by the EU Framework 7 Marie Curie project PEOPLE-2011-290000 "Inspire: Investigating Speech Processing in Realistic Environments".


Ricard Marxer, Jon Barker, Martin Cooke, and Maria Luisa García Lecumberri (December 2016). A corpus of noise-induced word misperceptions for English. The Journal of the Acoustical Society of America, Volume 140, Issue 5. DOI: 10.1121/1.4967185.

University of the Basque Country

Related Language Articles:

Human language most likely evolved gradually
One of the most controversial hypotheses for the origin of human language faculty is the evolutionary conjecture that language arose instantaneously in humans through a single gene mutation.
'She' goes missing from presidential language
MIT researchers have found that although a significant percentage of the American public believed the winner of the November 2016 presidential election would be a woman, people rarely used the pronoun 'she' when referring to the next president before the election.
How does language emerge?
How did the almost 6000 languages of the world come into being?
New research quantifies how much speakers' first language affects learning a new language
Linguistic research suggests that accents are strongly shaped by the speaker's first language they learned growing up.
Why the language-ready brain is so complex
In a review article published in Science, Peter Hagoort, professor of Cognitive Neuroscience at Radboud University and director of the Max Planck Institute for Psycholinguistics, argues for a new model of language, involving the interaction of multiple brain networks.
Do as i say: Translating language into movement
Researchers at Carnegie Mellon University have developed a computer model that can translate text describing physical movements directly into simple computer-generated animations, a first step toward someday generating movies directly from scripts.
Learning language
When it comes to learning a language, the left side of the brain has traditionally been considered the hub of language processing.
Learning a second alphabet for a first language
A part of the brain that maps letters to sounds can acquire a second, visually distinct alphabet for the same language, according to a study of English speakers published in eNeuro.
Sign language reveals the hidden logical structure, and limitations, of spoken language
Sign languages can help reveal hidden aspects of the logical structure of spoken language, but they also highlight its limitations because speech lacks the rich iconic resources that sign language uses on top of its sophisticated grammar.
Lying in a foreign language is easier
It is not easy to tell when someone is lying.
More Language News and Language Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Listen Again: Reinvention
Change is hard, but it's also an opportunity to discover and reimagine what you thought you knew. From our economy, to music, to even ourselves–this hour TED speakers explore the power of reinvention. Guests include OK Go lead singer Damian Kulash Jr., former college gymnastics coach Valorie Kondos Field, Stockton Mayor Michael Tubbs, and entrepreneur Nick Hanauer.
Now Playing: Science for the People

#562 Superbug to Bedside
By now we're all good and scared about antibiotic resistance, one of the many things coming to get us all. But there's good news, sort of. News antibiotics are coming out! How do they get tested? What does that kind of a trial look like and how does it happen? Host Bethany Brookeshire talks with Matt McCarthy, author of "Superbugs: The Race to Stop an Epidemic", about the ins and outs of testing a new antibiotic in the hospital.
Now Playing: Radiolab

Dispatch 6: Strange Times
Covid has disrupted the most basic routines of our days and nights. But in the middle of a conversation about how to fight the virus, we find a place impervious to the stalled plans and frenetic demands of the outside world. It's a very different kind of front line, where urgent work means moving slow, and time is marked out in tiny pre-planned steps. Then, on a walk through the woods, we consider how the tempo of our lives affects our minds and discover how the beats of biology shape our bodies. This episode was produced with help from Molly Webster and Tracie Hunte. Support Radiolab today at