Brain Signals aid in speech.

Even in the digital age when articles can be edited and videos can be re-shot, we take a moment to churn our words and sync them with our thoughts and ideas. Thousands of people are reduced to painstaking means of communication as a result of injuries suffered in accidents or combat, of strokes, or of neurodegenerative disorders such as amyotrophic lateral sclerosis, or ALS, that disable the ability to speak.

Researchers have developed virtual speech aids. Those work by decoding the brain signals responsible for recognising letters and words, the verbal representations of speech. But those approaches lack the speed and fluidity of natural speaking.

Scientists have created a virtual vocal tract – complete with lips, jaw and tongue – that can generate natural-sounding synthetic speech using brain signals. They have developed a virtual prosthetic voice, a system that decodes the brain’s vocal intentions and translates them into mostly understandable speech, with no need to move a muscle, even those in the mouth. The brain-machine interface created by neuroscientists could one day restore the voices of people who have lost the ability to speak due to paralysis and other forms of neurological damage.

Stroke, traumatic brain injury, and neurodegenerative diseases such as Parkinson’s disease, multiple sclerosis, and amyotrophic lateral sclerosis (ALS, or Lou Gehrig’s disease) often result in an irreversible loss of the ability to speak.


Some people with severe speech disabilities learn to spell out their thoughts letter-by-letter using assistive devices that track very small eye or facial muscle movements. However, producing text or synthesised speech with such devices is laborious, error-prone, and painfully slow, typically permitting a maximum of 10 words per minute, compared to the 100-150 words per minute of natural speech.

The system, demonstrates that it is possible to create a synthesised version of a person’s voice that can be controlled by the activity of their brain’s speech centres. In the future, this approach could not only restore fluent communication to individuals with severe speech disability, researchers said, but could also reproduce some of the musicality of the human voice that conveys the speaker’s emotions and personality. For the first time, this study demonstrates that we can generate entire spoken sentences based on an individual’s brain activity.

“The relationship between the movements of the vocal tract and the speech sounds that are produced is a complicated one,” said Gopala Anumanchipalli, a speech scientist who led the study. “We reasoned that if these speech centers in the brain are encoding movements rather than sounds, we should try to do the same in decoding those signals,” Anumanchipalli said.

Researchers asked five volunteers with intact speech who had electrodes temporarily implanted in their brains to map the source of their seizures in preparation for neurosurgery to treat epilepsy — to read several hundred sentences aloud while the researchers recorded activity from a brain region known to be involved in language production.

Based on the audio recordings of participants’ voices, the researchers used linguistic principles to reverse engineer the vocal tract movements needed to produce those sounds: pressing the lips together here, tightening vocal cords there, shifting the tip of the tongue to the roof of the mouth, then relaxing it, and so on.

This detailed mapping of sound to anatomy allowed the scientists to create a realistic virtual vocal tract for each participant that could be controlled by their brain activity. This comprised two “neural network” machine learning algorithms: a decoder that transforms brain activity patterns produced during speech into movements of the virtual vocal tract, and a synthesiser that converts these vocal tract movements into a synthetic approximation of the participant’s voice.

The synthetic speech produced by these algorithms was significantly better than synthetic speech directly decoded from participants’ brain activity without the inclusion of simulations of the speakers’ vocal tracts, the researchers found.

The algorithms produced sentences that were understandable to hundreds of human listeners in crowd sourced transcription tests.

Experts said the new work represented a “proof of principle,” a preview of what may be possible after further experimentation and refinement. The system was tested on people who speak normally; it has not been tested in people whose neurological conditions or injuries, such as common strokes, could make the decoding difficult or impossible.


Many people with epilepsy do poorly on medication and opt to undergo brain surgery. Before operating, doctors must first locate the “hot spot” in each person’s brain where the seizures originate; this is done with electrodes that are placed in the brain, or on its surface, and listen for telltale electrical storms.