โก Quick Summary
This study evaluated the performance of automatic speech recognition (ASR) in analyzing speech samples from individuals with schizophrenia-spectrum disorders, revealing word error rates (WER) between 0.31 and 0.58. The findings emphasize the need to consider not just WER but also the type, meaning, and context of ASR errors in clinical settings.
๐ Key Details
- ๐ Sample Size: 50 speech samples from individuals with schizophrenia-spectrum disorders
- ๐งฉ Key Metrics: Word error rates (WER) ranging from 0.31 to 0.58
- ๐ Variability: WER varied based on country of birth and severity of positive symptoms
- ๐ NLP Analysis: ASR transcripts showed higher GloVe semantic similarity and fewer sentences than manual transcripts
๐ Key Takeaways
- ๐ ASR performance is crucial for effective mental health research and clinical applications.
- ๐ Word error rates (WER) alone do not provide a complete picture of ASR effectiveness.
- ๐ Variations in WER can be influenced by demographic factors such as country of birth.
- ๐ก NLP metrics showed weaker correlations with symptom scores in ASR transcripts compared to manual ones.
- ๐ฅ Real-world applications of ASR include electronic health records, voice chatbots, and clinical decision support systems.
- ๐ ๏ธ Future research should focus on evaluating ASR performance beyond traditional metrics.
- ๐ Understanding ASR errors in terms of type and context is essential for clinical safety.
๐ Background
The integration of natural language processing (NLP) in mental health research has opened new avenues for studying large populations and developing scalable clinical tools. However, the effectiveness of these tools heavily relies on the performance of automatic speech recognition (ASR) systems, particularly when analyzing speech from clinical populations such as those with schizophrenia-spectrum disorders.
๐๏ธ Study
This study aimed to assess the performance of ASR by analyzing 50 speech samples from individuals diagnosed with schizophrenia-spectrum disorders. The researchers focused on identifying the word error rates (WER) and understanding how these rates varied based on demographic factors and symptom severity.
๐ Results
The analysis revealed that the word error rates (WER) ranged from 0.31 to 0.58, indicating significant variability. Furthermore, ASR transcripts exhibited higher GloVe semantic similarity and fewer sentences compared to manual transcripts, suggesting that ASR may not capture the full complexity of speech in clinical contexts. Notably, the correlation between NLP metrics and symptom scores was weaker for ASR transcripts.
๐ Impact and Implications
The findings of this study have important implications for the use of ASR in clinical settings. By highlighting the limitations of relying solely on word error rates, the research encourages a more nuanced evaluation of ASR performance. This approach is vital for ensuring the safe and effective implementation of ASR technologies in applications such as electronic health records, voice chatbots, and clinical decision support systems.
๐ฎ Conclusion
This study underscores the importance of moving beyond traditional metrics like word error rate (WER) when evaluating ASR performance in clinical research. By considering the type, meaning, and context of ASR errors, researchers and developers can enhance the reliability and safety of ASR applications in mental health. The future of ASR in clinical settings looks promising, and further research is essential to refine these technologies.
๐ฌ Your comments
What are your thoughts on the implications of ASR in mental health research? We would love to hear your insights! ๐ฌ Share your comments below or connect with us on social media:
Moving beyond word error rate to evaluate automatic speech recognition in clinical samples: Lessons from research into schizophrenia-spectrum disorders.
Abstract
Natural language processing applications to mental health research depend on automatic speech recognition (ASR) to study large samples and develop scalable clinical tools. To ensure safe and effective implementation, it is crucial to understand performance patterns of ASR for speech from clinical populations. Therefore, this study evaluated ASR performance in N=50 speech samples from individuals with schizophrenia-spectrum disorders, identifying word error rates (WER) ranging from 0.31 to 0.58. Different WER showed systematic variations based on country of birth and severity of positive symptoms. In subsequent NLP analysis, ASR transcripts showed significantly higher GloVe semantic similarity and fewer sentences than manual transcripts as well as weaker correlations between NLP metrics and symptom scores. We considered the potential impact of these differences in three real-world use cases of ASR: electronic health records, voice chatbots, and clinical decision support systems. Overall, we argue that assessing ASR performance requires looking beyond WER alone. In clinical settings, the potential impact of an ASR error is not only influenced by its rate but by its type, meaning and context. Our approach provides guidance on how to evaluate ASR in clinical research, offering guidance for future researchers and developers on key considerations for its implementation.
Author: [‘Just SA’, ‘Elvevรฅg B’, ‘Pandey S’, ‘Nenchev I’, ‘Brรถcker AL’, ‘Montag C’, ‘Morgan SE’]
Journal: Psychiatry Res
Citation: Just SA, et al. Moving beyond word error rate to evaluate automatic speech recognition in clinical samples: Lessons from research into schizophrenia-spectrum disorders. Moving beyond word error rate to evaluate automatic speech recognition in clinical samples: Lessons from research into schizophrenia-spectrum disorders. 2025; 352:116690. doi: 10.1016/j.psychres.2025.116690