Speech Recognition
Articles tagged with Speech Recognition
AI benefits from measured non-linearity
Researchers found that dosed nonlinearity improves model performance in various tasks, especially with limited data. Nonlinear units function like flexible switches, adapting linear processing modes based on context.
Can Amazon Alexa or Google Home help detect Parkinson’s?
Researchers developed an AI-based screening tool using pangrams to detect Parkinson’s disease with nearly 86 percent accuracy. The web-based test analyzes voice recordings for subtle patterns linked to the neurodegenerative disease, identifying potential warning signs.
AI medical receptionist modernizing doctor appointments, poised to improve patient care nationwide
Cassie, a digital-human assistant developed by Texas A&M University, is transforming the way patients interact with healthcare providers. With facial recognition and emotional intelligence, Cassie offers a two-way interaction that feels like a conversation.
Do dogs understand words from AIC buttons?
A new study published in Scientific Reports reveals that audio quality severely affects dogs' ability to recognize and respond to recorded words. Dogs excelled at responding to direct human speech, but struggled with AIC buttons, which lost frequencies necessary for conveying human speech.
Protecting audio privacy at the source
Researchers created a lightweight filter that can run on small microcontrollers, identifying and removing likely speech content from audio data before it's sent off the device. This helps balance utility and privacy, enabling devices like smart speakers to prioritize user security while still offering valuable sensing capabilities.
Towards a fully automated approach for assessing English proficiency
A new study by Doshisha University demonstrates the feasibility and reliability of fully automated speaking tests for English language learners, enabling more frequent evaluation and larger-scale studies. The approach uses AI-powered speech recognition and computational metrics to assess language proficiency.
Paralyzed man moves robotic arm with his thoughts
Researchers at UC San Francisco have enabled a paralyzed man to control a robotic arm through a device that relays signals from his brain to a computer. The device, known as a brain-computer interface (BCI), worked for a record 7 months without needing to be adjusted.
It’s not just what you say – it’s also how you say it
A Northwestern University study discovered a region of the brain processes subtle changes in voice pitch, transforming them into meaningful linguistic information that guides human understanding. The findings challenge long-held assumptions about speech perception and have implications for speech rehabilitation, AI-powered voice assist...
Terasaki Institute for Biomedical Innovation announces 2025 Paul and Hisako Terasaki Award recipients
The Terasaki Institute recognizes Dr. Cato Laurencin's groundbreaking contributions to regenerative engineering, while Dr. Jun Chen is recognized for his innovative technologies in soft bioelectronics and magnetoelastic materials.
Speech Accessibility Project data leads to recognition improvements on Microsoft Azure
The project's recordings help improve voice recognition tools by providing diverse speech patterns to train artificial intelligence models. Microsoft has seen significant improvements in recognizing non-standard English speech, with accuracy gains ranging from 18% to 60%, depending on the speaker's disability.
Synchronization in neural nets: Mathematical insight into neuron readout drives significant improvements in prediction accuracy
Researchers introduced a novel approach to enhance reservoir computing, incorporating a generalized readout that offers improved accuracy and robustness compared to conventional methods. The new method uses a nonlinear combination of reservoir variables to uncover deeper patterns in input data.
Automatic speech recognition on par with humans in noisy conditions
Researchers found that humans still outperform ASR systems in noisy environments, but Whisper large-v3 matched human performance in all tested conditions except naturalistic pub noise. The system's ability to process acoustic properties and map them to the intended message was impressive.
Heart rate activity influences when infants speak
Researchers found that babies' first vocalizations and attempts at forming words coincide with fluctuations in their heart rate. This discovery may indicate that successful speech development depends on predictable ranges of autonomic activity during infancy.
Dogs can recognize familiar speakers
Researchers at Eötvös Loránd University found that dogs can recognize their owners based on pre-recorded speech, demonstrating an ability to discriminate between familiar voices. Dogs performed well in matching the correct owner with their voice, with performance best when hearing their main owner's voice.
Developing artificial intelligence tools for health care
Researchers developed a new benchmark for health care using reinforcement learning, which shows promise in managing chronic or psychiatric diseases. However, current methods are data-hungry and fail to perform accurately when tested on real-world data.
Graz language database improves automatic speech recognition of Austrian German
Researchers at Graz University of Technology developed a new database to improve speech recognition of Austrian German using speech data from 38 speakers. They found that traditional HMM-based systems are more robust for short sentences and dialectal language, while transformer-based models excel with longer sentences and context.
2025 SPIE-Franz Hillenkamp Postdoctoral Fellowship awarded to Morgan Fogarty
Fogarty's research aims to monitor language function and recovery in post-stroke patients using DOT. She hopes to establish the feasibility of brain-computer interfaces to restore inter-personal communication for post-stroke patients.
Online training could help older adults communicate in noisy environments
Research by UCL experts found that online training can improve speech intelligibility for both older and younger adults, with a 30% improvement in understanding sentences spoken by new voices. The study suggests that practicing listening to regularly encountered voices could enhance everyday communication.
Hear this! Transforming health care with speech-to-text technology #ASA187
The study analyzes the impact of enunciation on speech-to-text accuracy in medical situations, highlighting challenges with medical terms and noisy environments. A new audio dataset was created to improve the technology's usefulness for healthcare professionals.
NTU Singapore start-up BrookieKids launches AI-powered interactive storytelling to help young children practice their mother tongue
BrookieKids, a local education tech startup, has launched an AI-powered digital library featuring over 50 Mandarin voice-interactive stories. The platform uses speech AI to guide conversations and boost conversational skills in a fun and interactive way.
Speech Accessibility Project partners with The Matthew Foundation, Massachusetts Down Syndrome Congress
The Speech Accessibility Project is working with two new partners, The Matthew Foundation and the Massachusetts Down Syndrome Congress, to recruit adults with Down syndrome and other conditions. The project aims to provide voice command devices to improve inclusion and employment opportunities for individuals with disabilities.
What happens in the brain when a person with schizophrenia “hears voices”?
Researchers found that people with schizophrenia who experience auditory hallucinations have impaired brain processes, including a 'broken' corollary discharge and 'noisy' efference copy. This impairment may contribute to the loss of ability to distinguish reality from fantasy.
Automatic speech recognition learned to understand people with Parkinson’s disease — by listening to them
Researchers trained an automatic speech recognizer on recordings from people with dysarthria related to Parkinson's disease, achieving a 30% accuracy improvement. The study, led by Mark Hasegawa-Johnson, provides valuable data for improving voice recognition devices.
Researchers identify basic approaches for how people recognize words
A new study found that people with cochlear implants use the same three word-recognition dimensions as normal hearing people: Wait and See, Sustained Activation, and Slow Activation. These dimensions help explain how people recognize words, even with different ways of hearing.
Researchers expose vulnerability of speech emotion recognition models to adversarial attacks
Speech emotion recognition models are susceptible to adversarial attacks, which can significantly reduce their performance. The study found that black-box attacks outperformed white-box attacks and achieved impressive results despite limited access to the model's internal workings.
Developed a 21-language, fast and high-fidelity neural text-to-speech technology that works on smartphones
A novel, fast and high-quality neural text-to-speech model was successfully developed using a Transformer encoder + ConvNeXt decoder and MS-FC-HiFi-GAN. The model can synthesize one second of speech at high speed in just 0.1 seconds using a single CPU core, achieving eight times faster synthesis than conventional methods.
Sorry, I didn’t get that: evaluating usability issues with AI-assisted smart speakers
Researchers evaluated the learnability of voice-controlled smart speakers, finding users face challenges due to lack of system feedback and response errors. Despite proficiency after repeated attempts, usability issues remained unchanged.
Building a better sarcasm detector #ASA186
Researchers developed a multimodal algorithm for improved sarcasm detection, examining multiple aspects of audio recordings for increased accuracy. The approach combines sentiment analysis using text and emotion recognition using audio for a comprehensive analysis.
Machine listening: Making speech recognition systems more inclusive
Researchers found that African American English speakers adjust their speech rate and pitch variation when using voice technology, adopting a slower and more monotone register to be better understood. This adaptation helps address disparities in speech recognition systems and aims to improve inclusivity for diverse language varieties.
iTalkBetter app significantly improves speech in stroke patients
A new study published in eClinicalMedicine found that the iTalkBetter app significantly improved patients' ability to talk after six weeks of use. The app provided digital speech therapy for over 200 commonly used words, with a 13% increase in naming items and improvements in spontaneous speech.
The brain processes speech and its echo separately
A recent study published in PLOS Biology found that the human brain can segregate direct speech from its echo, allowing for reliable recognition of echoic speech. This neural separation is essential for understanding conversations in noisy environments and is supported by magnetoencephalography recordings.
Using AI-related technologies can significantly enhance human cognition, finds new study
A new study published in Frontiers in Artificial Intelligence found that training in Interlingual Respeaking, a new practice combining human collaboration with speech recognition software, can improve language professionals' cognitive abilities. The research, conducted by the University of Surrey, showed significant enhancements in wor...
What if Alexa or Siri sounded more like you? Study says you’ll like it better
Researchers found a strong preference for extroverted virtual assistants and increasing personality similarity led to higher ratings, more careful assessment of information, and resistance to persuasive attempts. The study's findings may have implications for ways to increase user resistance to misinformation.
Improving word intelligibility of bone-conducted speech using bone-conduction headphones
Researchers developed methods to emphasize higher-frequency components of bone-conducted speech signals, improving intelligibility in noisy environments. The study's findings have potential real-life applications in BC-type hearing aids and auditory augmentation.
Physiological-physical feature fusion for automatic voice spoofing detection
Researchers develop a novel voice spoofing detection method combining physiological and physical features, achieving improved performance over existing methods. The proposed method outperforms single systems in terms of EER and t-DCF scores.
Self-administered mobile application to detect Alzheimer's disease using speech data
Researchers developed a mobile application to detect Alzheimer's and mild cognitive impairment from speech data. The app achieved high accuracy rates, demonstrating its potential as an early detection tool.
Machine learning model sheds light on how brains recognize communication sounds
A machine learning model helps explain how brains recognize the meaning of communication sounds, such as animal calls or spoken words. The study models sound-processing networks in social animals' brains and demonstrates their ability to distinguish between different sound categories.
Older adults perceive artificial intelligence as more human-like than younger adults do
A recent Baycrest study found that older adults are less able to distinguish between computer-generated and human speech compared to younger counterparts. This diminished ability could be related to older adults' reduced capacity to recognize emotions in speech, highlighting the need for AI-related training programs.
A brain-inspired computer model that understands speech like humans
Researchers developed a computer model based on human brain mechanisms to improve speech comprehension. The model extracts multilevel information from ongoing speech and uses non-linguistic knowledge for disambiguating word meanings. This approach is more human-like than existing language models like ChatGPT.
Music beats beeps: Researchers find redesigned medical alarms can better alert staff and improve patient experience
A new study by McMaster University researchers found that redesigned medical alarms with musical tones can improve speech recognition and reduce annoyance. The study suggests that changing the sounds of medical devices can make alarms less disruptive, allowing for better staff communication and reducing recovery times.
Voice-activated system for hands-free, safer DNA handling
Scientists have created a small, portable device that can extract and pretreat bacterial DNA using voice commands, making it easier and safer for researchers to handle potentially infectious samples. The device has shown promise in extracting DNA from Salmonella Typhimurium with an efficiency of 70% in under a minute.
Bot gives nonnative speakers the floor in videoconferencing
A new study at Cornell University introduced an automated participant that periodically interrupts the conversation to give nonnative speakers a chance to speak. The AI bot increased participation from 12% to 17% of all words spoken, with nonnative speakers feeling valued and appreciated for their perspectives.
LTI project aims to expand language technologies
A research team at Carnegie Mellon University is working to simplify data requirements for speech recognition models, aiming to reach 2,000 languages. By focusing on linguistic elements common across many languages and using a phylogenetic tree, the team hopes to eliminate the need for audio data.
Our brain is a prediction machine that is always active
Researchers at Max Planck Institute for Psycholinguistics found that our brain is a prediction machine continuously making predictions on multiple levels. They analyzed brain activity while people listened to Hemingway or Sherlock Holmes stories and text, finding the brain response was stronger when words were unexpected in context.
Machine learning improves human speech recognition
Researchers developed a machine learning model that provides good predictions for human speech recognition in noisy environments, benefiting hearing-impaired listeners. The model outperformed expectations and showed strong correlations with measured data.
MU earns $12 million in grants to boost science education, literacy
University of Missouri researchers are using a video game to teach middle schoolers about science and improving literacy outcomes for second graders through speech recognition software. The programs aim to increase student engagement and prepare them for future careers in various fields.
Seeking a way of preventing audio models for AI machine learning from being fooled
A recent study found that conventional metrics for detecting audio adversarial examples are unreliable and fail to accurately represent human perception. Researchers proposed a more robust evaluation method, but acknowledge the complexity of modeling auditory perception with mathematical metrics.
Cognitive neuroscience could pave the way for emotionally intelligent robots
A novel auditory perception model simulates human ear dynamics to capture time dynamics of dimensional emotions. Neural networks then extract features that reflect this time dynamics, showing better emotion recognition performance than traditional acoustic-based features.
Automated speech recognition and racial bias
Researchers found that state-of-the-art ASR systems performed worse on black speakers than white speakers, with error rates of 0.35 and 0.19 words per hour respectively. The study attributes these disparities to limitations in the acoustic models' ability to capture African American Vernacular English pronunciation and prosody.
Grainger engineers voice localization techniques for smart speakers
Researchers from the University of Illinois have developed a system called VoLoc that uses microphone array recordings and room echoes to infer user location within a home. This technology can improve smart devices' support for available skills, such as turn-on-light or increase-temperature commands.
Sound deprivation in one ear leads to speech recognition difficulties
A new study suggests that chronic conductive hearing loss due to middle-ear infections can lead to neural deficits and difficulties in noisy environments. Researchers found patients with longstanding conductive hearing impairment had lower speech-recognition scores on the affected side, even when speech was audible.
Sound sense: Brain 'listens' for distinctive features in sounds
Researchers developed a computational model that explores how the auditory system achieves accurate speech recognition by identifying distinct categories of sounds. The model found that the brain looks for informative features, such as those characteristic of a face, to distinguish between different vocalizations.
RIT researchers use deep learning to help preserve the Seneca language
RIT researchers are building an automatic speech recognition application using deep learning to document and transcribe the traditional Seneca language. The project aims to support other rare or vanishing languages as well. The team has collected over 50 hours of recorded material and achieved promising results.
Machine-learning system tackles speech and object recognition, all at once
The MIT-developed AI model can associate specific words with specific patches of pixels in an image, enabling real-time object highlighting based on spoken descriptions. This innovation holds promise for applications such as language translation and automatic image annotation.
Researchers peer under the hoods of neural networks
Researchers used a newly developed interpretive technique to analyze neural networks trained for machine translation and speech recognition. They found that lower-level tasks, such as sound recognition or part-of-speech recognition, are prioritized before higher-level tasks like transcription or semantic interpretation.
Chip could make voice control ubiquitous in electronics
A new chip designed by MIT researchers has the potential to make voice control ubiquitous in electronics, offering significant power savings. The chip's ability to minimize memory bandwidth and compress weights associated with each node enables efficient speech recognition, making it practical for relatively simple electronic devices.
New approach may open up speech recognition to more languages
Researchers at MIT's CSAIL have developed a new system that analyzes correspondences between images and spoken descriptions to train speech-recognition systems. The system can potentially provide automatic speech recognition for less-resourced languages, leading to fully automated translation capabilities.
Speech technology enables kids to control video game
Disney researchers developed a speech technology system that can sort through overlapping speech, social side talk, and creative pronunciations of young children to make it work. The system was 85% accurate in recognizing keywords, outperforming commercial speech recognition systems.
'Data smashing' could unshackle automated discovery
Researchers at Cornell University have developed a new method called 'data smashing' that enables automated discovery without human intervention, opening doors to complex observations and expert-driven analysis.