Nav: Home

Synthetic speech generated from brain recordings

April 24, 2019

A state-of-the-art brain-machine interface created by UC San Francisco neuroscientists can generate natural-sounding synthetic speech by using brain activity to control a virtual vocal tract -- an anatomically detailed computer simulation including the lips, jaw, tongue, and larynx. The study was conducted in research participants with intact speech, but the technology could one day restore the voices of people who have lost the ability to speak due to paralysis and other forms of neurological damage.

Stroke, traumatic brain injury, and neurodegenerative diseases such as Parkinson's disease, multiple sclerosis, and amyotrophic lateral sclerosis (ALS, or Lou Gehrig's disease) often result in an irreversible loss of the ability to speak. Some people with severe speech disabilities learn to spell out their thoughts letter-by-letter using assistive devices that track very small eye or facial muscle movements. However, producing text or synthesized speech with such devices is laborious, error-prone, and painfully slow, typically permitting a maximum of 10 words per minute, compared to the 100-150 words per minute of natural speech.

The new system being developed in the laboratory of Edward Chang, MD -- described April 24, 2019 in Nature -- demonstrates that it is possible to create a synthesized version of a person's voice that can be controlled by the activity of their brain's speech centers. In the future, this approach could not only restore fluent communication to individuals with severe speech disability, the authors say, but could also reproduce some of the musicality of the human voice that conveys the speaker's emotions and personality.

"For the first time, this study demonstrates that we can generate entire spoken sentences based on an individual's brain activity," said Chang, a professor of neurological surgery and member of the UCSF Weill Institute for Neuroscience. "This is an exhilarating proof of principle that with technology that is already within reach, we should be able to build a device that is clinically viable in patients with speech loss."

Virtual Vocal Tract Improves Naturalistic Speech Synthesis

The research was led by Gopala Anumanchipalli, PhD, a speech scientist, and Josh Chartier, a bioengineering graduate student in the Chang lab. It builds on a recent study in which the pair described for the first time how the human brain's speech centers choreograph the movements of the lips, jaw, tongue, and other vocal tract components to produce fluent speech.

From that work, Anumanchipalli and Chartier realized that previous attempts to directly decode speech from brain activity might have met with limited success because these brain regions do not directly represent the acoustic properties of speech sounds, but rather the instructions needed to coordinate the movements of the mouth and throat during speech.

"The relationship between the movements of the vocal tract and the speech sounds that are produced is a complicated one," Anumanchipalli said. "We reasoned that if these speech centers in the brain are encoding movements rather than sounds, we should try to do the same in decoding those signals."

In their new study, Anumancipali and Chartier asked five volunteers being treated at the UCSF Epilepsy Center -- patients with intact speech who had electrodes temporarily implanted in their brains to map the source of their seizures in preparation for neurosurgery -- to read several hundred sentences aloud while the researchers recorded activity from a brain region known to be involved in language production.

Based on the audio recordings of participants' voices, the researchers used linguistic principles to reverse engineer the vocal tract movements needed to produce those sounds: pressing the lips together here, tightening vocal cords there, shifting the tip of the tongue to the roof of the mouth, then relaxing it, and so on.

This detailed mapping of sound to anatomy allowed the scientists to create a realistic virtual vocal tract for each participant that could be controlled by their brain activity. This comprised two "neural network" machine learning algorithms: a decoder that transforms brain activity patterns produced during speech into movements of the virtual vocal tract, and a synthesizer that converts these vocal tract movements into a synthetic approximation of the participant's voice.

The synthetic speech produced by these algorithms was significantly better than synthetic speech directly decoded from participants' brain activity without the inclusion of simulations of the speakers' vocal tracts, the researchers found. The algorithms produced sentences that were understandable to hundreds of human listeners in crowdsourced transcription tests conducted on the Amazon Mechanical Turk platform.

As is the case with natural speech, the transcribers were more successful when they were given shorter lists of words to choose from, as would be the case with caregivers who are primed to the kinds of phrases or requests patients might utter. The transcribers accurately identified 69 percent of synthesized words from lists of 25 alternatives and transcribed 43 percent of sentences with perfect accuracy. With a more challenging 50 words to choose from, transcribers' overall accuracy dropped to 47 percent, though they were still able to understand 21 percent of synthesized sentences perfectly.

"We still have a ways to go to perfectly mimic spoken language," Chartier acknowledged. "We're quite good at synthesizing slower speech sounds like 'sh' and 'z' as well as maintaining the rhythms and intonations of speech and the speaker's gender and identity, but some of the more abrupt sounds like 'b's and 'p's get a bit fuzzy. Still, the levels of accuracy we produced here would be an amazing improvement in real-time communication compared to what's currently available."

Artificial Intelligence, Linguistics, and Neuroscience Fueled Advance

The researchers are currently experimenting with higher-density electrode arrays and more advanced machine learning algorithms that they hope will improve the synthesized speech even further. The next major test for the technology is to determine whether someone who can't speak could learn to use the system without being able to train it on their own voice and to make it generalize to anything they wish to say.

Preliminary results from one of the team's research participants suggest that the researchers' anatomically based system can decode and synthesize novel sentences from participants' brain activity nearly as well as the sentences the algorithm was trained on. Even when the researchers provided the algorithm with brain activity data recorded while one participant merely mouthed sentences without sound, the system was still able to produce intelligible synthetic versions of the mimed sentences in the speaker's voice.

The researchers also found that the neural code for vocal movements partially overlapped across participants, and that one research subject's vocal tract simulation could be adapted to respond to the neural instructions recorded from another participant's brain. Together, these findings suggest that individuals with speech loss due to neurological impairment may be able to learn to control a speech prosthesis modeled on the voice of someone with intact speech.

"People who can't move their arms and legs have learned to control robotic limbs with their brains," Chartier said. "We are hopeful that one day people with speech disabilities will be able to learn to speak again using this brain-controlled artificial vocal tract."

Added Anumanchipalli, "I'm proud that we've been able to bring together expertise from neuroscience, linguistics, and machine learning as part of this major milestone towards helping neurologically disabled patients."
-end-
Authors: Anumanchipalli and Chartier are co-first authors of the new study. Chang, a Bowes Biomedical Investigator at UCSF, professor in the Department of Neurological Surgery and member of the UCSF Weill Institute for Neurosciences, is the senior and corresponding author.

Funding: This research was primarily funded by the National Institutes of Health (grants DP2 OD008627 and U01 NS098971-01). Chang is a New York Stem Cell Foundation Robertson Investigator. This research was also supported by the New York Stem Cell Foundation, the Howard Hughes Medical Institute, the McKnight Foundation, the Shurl and Kay Curci Foundation, and the William K. Bowes Foundation.

Disclosures: The authors declare no competing interests.

About UCSF: UC San Francisco (UCSF) is a leading university dedicated to promoting health worldwide through advanced biomedical research, graduate-level education in the life sciences and health professions, and excellence in patient care. It includes top-ranked graduate schools of dentistry, medicine, nursing and pharmacy; a graduate division with nationally renowned programs in basic, biomedical, translational and population sciences; and a preeminent biomedical research enterprise. It also includes UCSF Health, which comprises three top-ranked hospitals - UCSF Medical Center and UCSF Benioff Children's Hospitals in San Francisco and Oakland - as well as Langley Porter Psychiatric Hospital and Clinics, UCSF Benioff Children's Physicians and the UCSF Faculty Practice. UCSF Health has affiliations with hospitals and health organizations throughout the Bay Area. Please visit http://www.ucsf.edu/news.

Follow UCSF
ucsf.edu | Facebook.com/ucsf | Twitter.com/ucsf | YouTube.com/ucsf

University of California - San Francisco

Related Brain Activity Articles:

Brain activity intensity drives need for sleep
The intensity of brain activity during the day, notwithstanding how long we've been awake, appears to increase our need for sleep, according to a new UCL study in zebrafish, published in Neuron.
Do babies like yawning? Evidence from brain activity
Contagious yawning is observed in many mammals, but there is no such report in human babies.
Understanding brain activity when you name what you see
Using complex statistical methods and fast measurement techniques, researchers found how the brain network comes up with the right word and enables us to say it.
Your brain activity can be used to measure how well you understand a concept
As students learn a new concept, measuring how well they grasp it has often depended on traditional paper and pencil tests.
Altered brain activity in antisocial teenagers
Teenage girls with problematic social behavior display reduced brain activity and weaker connectivity between the brain regions implicated in emotion regulation.
Gender impacts brain activity in alcoholics
Compared to alcoholic women, alcoholic men have more diminished brain activity in areas responsible for emotional processing (limbic regions including the amygdala and hippocampus), as well as memory and social processing (cortical regions including the superior frontal and supramarginal regions) among other functions.
Light, physical activity reduces brain aging
Incremental physical activity, even at light intensity, is associated with larger brain volume and healthy brain aging.
Measuring brain activity in milliseconds possible through new research
Researchers from King's College London, Harvard and INSERM-Paris have discovered a new way to measure brain function in milliseconds using magnetic resonance elastography (MRE).
Autism: Brain activity as a biomarker
Researchers from Jülich, Switzerland, France, the Netherlands, and the UK have discovered specific activity patterns in the brains of people with autism.
New MRI sensor can image activity deep within the brain
MIT researchers have developed an MRI-based calcium sensor that allows them to peer deep into the brain.
More Brain Activity News and Brain Activity Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Uncharted
There's so much we've yet to explore–from outer space to the deep ocean to our own brains. This hour, Manoush goes on a journey through those uncharted places, led by TED Science Curator David Biello.
Now Playing: Science for the People

#556 The Power of Friendship
It's 2020 and times are tough. Maybe some of us are learning about social distancing the hard way. Maybe we just are all a little anxious. No matter what, we could probably use a friend. But what is a friend, exactly? And why do we need them so much? This week host Bethany Brookshire speaks with Lydia Denworth, author of the new book "Friendship: The Evolution, Biology, and Extraordinary Power of Life's Fundamental Bond". This episode is hosted by Bethany Brookshire, science writer from Science News.
Now Playing: Radiolab

Dispatch 2: Every Day is Ignaz Semmelweis Day
It began with a tweet: "EVERY DAY IS IGNAZ SEMMELWEIS DAY." Carl Zimmer – tweet author, acclaimed science writer and friend of the show – tells the story of a mysterious, deadly illness that struck 19th century Vienna, and the ill-fated hero who uncovered its cure ... and gave us our best weapon (so far) against the current global pandemic. This episode was reported and produced with help from Bethel Habte and Latif Nasser. Support Radiolab today at Radiolab.org/donate.